Genomic Data Privacy: Enabling a Genomic Revolution

Data security and compliance are recurring themes in the many conversations I have with researchers and organisations in life sciences. This isn’t surprising, considering the vertiginous amount of data being produced in the genomics space (check out my previous article on how big data is impeding genomics progress). To ensure genomic data privacy, it is crucial for stakeholders to adopt robust security measures.

Ensuring genomic data privacy is essential for maintaining trust in scientific research.

To foment meaningful scientific exploration, it is essential that researchers have access to vast quantities of distributed genomic data. At the same time, to protect individuals’ privacy, researchers must take exceptional measures to anonymise and de-identify participants’ genomic data to comply with relevant regulations such as the European data protection law, General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor rule in the United States. This conundrum is posing serious challenges for pharmaceutical and genomics-driven organisations today.

To navigate this complex landscape, organizations must prioritize stringent security protocols to safeguard genomic data. Facilitating scientific progress hinges on researchers’ access to extensive distributed genomic information. Upholding indiviuals’ privacy necessitates stringent anonymization and de-identification of genomic data to align with regulations like the GDPR and HIPAA Safe Harbor rule. This delicate balance between data utilization and privacy protection presents a significant hurdle for genomics-focused entities today. It is crucial to grasp this equilibrium for the advancement of research endeavors.

Understanding Genomic Data Privacy in Research

This re-identification risk makes genomic data privacy a critical issue for researchers and participants alike.

As the industry evolves, genomic data privacy must be a priority to protect individuals’ rights.

Consumers engaging in DTC testing should be aware of how their genomic data privacy is managed.

This trend raises concerns about genomic data privacy for many individuals.

Anonymisation and de-identification of patient data are incredibly difficult to ensure. At its core, genomic data can never be made truly anonymised because each person’s genetic code is unique. In fact, it has been demonstrated that the re-identification of ‘anonymised’ data is possible: researchers were able to identify more than 40% of participants in the Personal Genome Project (PGP) by name using genomic data, genealogical data and public records. This highlights the importance of maintaining genomic data privacy while progressing in research.

Maintaining genomic data privacy is vital for sustaining public trust in scientific research.

Ultimately, genomic data privacy concerns will impact the broader acceptance of genetic research.

Addressing genomic data privacy is paramount as the field advances, ensuring that individuals’ rights are safeguarded while fostering innovation.

This approach reinforces the importance of genomic data privacy in preserving individuals’ rights.

Federated solutions enhance genomic data privacy by minimizing the movement of sensitive information.

Our mission is to improve genomic data privacy while facilitating secure analysis.

Understanding how to protect genomic data privacy is crucial for all stakeholders.

And genomic data security concerns are not limited to participants in research projects. If you have ever taken a direct-to-consumer (DTC) genetic test from companies such as 23andMe and Ancestry, you should also be concerned about who has access to your genetic data.

DTC genetic testing companies are compiling huge databases with their customers’ genetic data, often enriched with personal health information. It has recently been estimated that over 60% of Americans of Northern European descent, the primary customer segment buying genetic kits, can be identified through such databases. Within two to three years, 90% of Americans of European descent will be identifiable from their DNA. Further, as genomic information is shared among blood relatives, the identification of one family member may affect the entire family’s privacy.

Pharmaceutical companies also want in on the action – these data treasure troves underpin a great deal of drug and therapy development. Just last year, GlaxoSmithKline purchased a $300 million stake in 23andMe, giving the pharmaceutical giant access to the genetic data of roughly 10 million individuals who have submitted a 23andMe DNA kit.

So, our genomic data is never 100% safe. The risks are amplified when one organisation shares sensitive data with another, because data movement increases the chances that it may be intercepted by unintended third parties. A Cambridge Analytica type data scandal in genomics would fundamentally erode critical public trust and throw life sciences back a decade. If patients and consumers are scared to have their DNA analysed, this will affect the critical basis of modern research, with disastrous effects for those urgently needing new and better treatments.

The question I ask people in life sciences is this – why should personal and sensitive data ever need be copied and/or transferred in the first place?

The answer is – It doesn’t, and it shouldn’t.

Genetic data privacy and compliance are ensured when federated data analysis methods are employed, which allows researchers to abstract analysis on top of secure multi-party computation systems. Essentially, data never moves. Instead of relying on certifications ‘guaranteeing’ safe handling of classified data, a federated approach allows two or more parties in a distributed system to perform secure analysis without exposing private data to risks. I call this security-by-design. Federation solidifies upon the zero-trust principle best practices to effectively minimise potential threats and data compliance breaches.

Scaling genetic big data analyses for millions of individuals’ data is an overwhelming challenge and security and compliance are paramount considerations. Lifebit CloudOS eliminates this challenge in a federated way. Unlike any other genomics platform, with Lifebit CloudOS, analyses run over distributed data where the data resides rather than having to move data. Organisations that have deployed Lifebit CloudOS tell me they no longer worry about the very real risks that moving and sharing sensitive data posed prior to adopting Lifebit’s federated solution.

We engineered Lifebit CloudOS as a fully federated multiparty system that allows genomics data to stay in your own secure cloud, HPC or hybrid IT environment. Lifebit CloudOS does not ingest or transfer any data, metadata or log information. By applying highly optimised privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis, we are now much closer to solving the issue of genetic data privacy and compliance.

If you would like to learn how Lifebit can help strengthen your data security and compliance protocols, or just want to chat about life sciences, please drop me a line at thorben@lifebit.ai

By Industry

By Goal

By Goal

Software

1. FEDERATED RESEARCH & DISCOVERY

2. FEDERATED DATA AUTOMATION

3. FEDERATED DATAHUB

Trusted Data Hub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Data

By Use Case

1. FEDERATED RESEARCH & DISCOVERY

Data Enclave

Biomarker Discovery

Back or reverse translation

2. FEDERATED DATA AUTOMATION

OMO/FHIR & Custom Data Model Standardisation

Enterprise Data Catalog (EDC)

Health & Variant Store

3. FEDERATED DATAHUB

DataHub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Airlock

FedRamp-in-a-box

By Use Case

Data Solutions

Learn

Contact

Support

Help center

24/7 support

By Industry

By Goal

By Goal

Software

1. FEDERATED RESEARCH & DISCOVERY

2. FEDERATED DATA AUTOMATION

3. FEDERATED DATAHUB

Trusted Data Hub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Data

By Use Case

1. FEDERATED RESEARCH & DISCOVERY

Data Enclave

Biomarker Discovery

Back or reverse translation

2. FEDERATED DATA AUTOMATION

OMO/FHIR & Custom Data Model Standardisation

Enterprise Data Catalog (EDC)

Health & Variant Store

3. FEDERATED DATAHUB

DataHub

4. ULTIMATE SECURITY & GOVERNANCE SOLUTIONS

Airlock

FedRamp-in-a-box

By Use Case

Data Solutions

Learn

Contact

Support

Help center

24/7 support

Understanding Genomic Data Privacy in Research

Reviving Genomics Progress: Overcoming Big Data Challenges

Sano Genetics Deploys CloudOS to Deliver Free Genotype Imputation

Company

Life Sciences

Healthcare

Platform