How to enable a genomic revolution without risking a data privacy catastrophe
Data security and compliance are recurring themes in the many conversations I have with researchers and organisations in life sciences. This isn’t surprising, considering the vertiginous amount of data being produced in the genomics space (check out my previous article on how big data is impeding genomics progress).
To foment meaningful scientific exploration, it is essential that researchers have access to vast quantities of distributed genomic data. At the same time, to protect individuals’ privacy, researchers must take exceptional measures to anonymise and de-identify participants’ genomic data to comply with relevant regulations such as the European data protection law, General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor rule in the United States. This conundrum is posing serious challenges for pharmaceutical and genomics-driven organisations today.
Anonymisation and de-identification of patient data are incredibly difficult to ensure. At its core, genomic data can never be made truly anonymised because each person’s genetic code is unique. In fact, it has been demonstrated that the re-identification of ‘anonymised’ data is possible: researchers were able to identify more than 40% of participants in the Personal Genome Project (PGP) by name using genomic data, genealogical data and public records.
And genomic data security concerns are not limited to participants in research projects. If you have ever taken a direct-to-consumer (DTC) genetic test from companies such as 23andMe and Ancestry, you should also be concerned about who has access to your genetic data.
DTC genetic testing companies are compiling huge databases with their customers’ genetic data, often enriched with personal health information. It has recently been estimated that over 60% of Americans of Northern European descent, the primary customer segment buying genetic kits, can be identified through such databases. Within two to three years, 90% of Americans of European descent will be identifiable from their DNA. Further, as genomic information is shared among blood relatives, the identification of one family member may affect the entire family’s privacy.
Pharmaceutical companies also want in on the action – these data treasure troves underpin a great deal of drug and therapy development. Just last year, GlaxoSmithKline purchased a $300 million stake in 23andMe, giving the pharmaceutical giant access to the genetic data of roughly 10 million individuals who have submitted a 23andMe DNA kit.
So, our genomic data is never 100% safe. The risks are amplified when one organisation shares sensitive data with another, because data movement increases the chances that it may be intercepted by unintended third parties. A Cambridge Analytica type data scandal in genomics would fundamentally erode critical public trust and throw life sciences back a decade. If patients and consumers are scared to have their DNA analysed, this will affect the critical basis of modern research, with disastrous effects for those urgently needing new and better treatments.
The question I ask people in life sciences is this – why should personal and sensitive data ever need be copied and/or transferred in the first place?
The answer is – It doesn’t, and it shouldn’t.
Genetic data privacy and compliance are ensured when federated data analysis methods are employed, which allows researchers to abstract analysis on top of secure multi-party computation systems. Essentially, data never moves. Instead of relying on certifications ‘guaranteeing’ safe handling of classified data, a federated approach allows two or more parties in a distributed system to perform secure analysis without exposing private data to risks. I call this security-by-design. Federation solidifies upon the zero-trust principle best practices to effectively minimise potential threats and data compliance breaches.
Scaling genetic big data analyses for millions of individuals’ data is an overwhelming challenge and security and compliance are paramount considerations. Lifebit CloudOS eliminates this challenge in a federated way. Unlike any other genomics platform, with Lifebit CloudOS, analyses run over distributed data where the data resides rather than having to move data. Organisations that have deployed Lifebit CloudOS tell me they no longer worry about the very real risks that moving and sharing sensitive data posed prior to adopting Lifebit’s federated solution.
We engineered Lifebit CloudOS as a fully federated multiparty system that allows genomics data to stay in your own secure cloud, HPC or hybrid IT environment. Lifebit CloudOS does not ingest or transfer any data, metadata or log information. By applying highly optimised privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis, we are now much closer to solving the issue of genetic data privacy and compliance.