Unwrapping Lifebit’s Data Security Layers With David Ardley
Divya Narasimhan, MSc
Our understanding of the human genome is rapidly evolving as data custodians like Genomics England and The Hong Kong Genome Project strive to gather large volumes of biological information from patients. Pharmaceutical companies and scientific researchers rely on this big data to leverage drug target discovery, but are faced with the formidable challenge of securely and safely accessing sensitive patient data. In the race for healthcare to keep pace with technology, innovative security measures are needed to ensure that advances in genomic medicine keep data security and privacy at their core.
Lifebit’s end-to-end precision medicine platform, Lifebit CloudOS, integrates expertise garnered over the years to implement security standards adopted from fields as far and wide as banking. David Ardley, Lifebit’s Platform Delivery Director, previously worked building secure IT infrastructure for JP Morgan and the Bank of America, before transitioning into the genomics field as former CTO and Head of Platforms at Genomics England. In this interview, we explore lessons learned about the importance of deploying data privacy and integrity measures in genomics.
Biobanks need Security that matches and exceeds Financial Banks
When personal data promises to power huge enterprises, hackers seek to profit from owning sensitive patient intel, driving up the premium placed on data security. When biobanks collect data from patients, participants volunteer their informed consent for the use of their data in driving medical research forward. It is therefore imperative that data obtained from these patients should be guarded against misuse from security threats. Data protection laws, such as the General Data Protection Regulation (GDPR) in the EU, enforce privacy laws on companies handling sensitive personal data, and ensure their compliance with security measures.
Since genomic medicine is still a relatively new discipline, gaining public confidence in trailblazing projects like the 100,000 Genomes project is a landmark and can only be upheld by maintaining strong data security and privacy measures.
“There is an inherent need to ensure that people feel that this data is protected and trusted […] to maintain the momentum that’s been gained so far in this area.”
David highlighted a number of parallels between the worlds of banking and genomics. Just as financial institutions handle trillions of dollars in transactions, genomic biobanks also process large terabytes of data. Likewise, high-speed and large volume transactions in finance have parallels to genomic data processing, and with data uploads or downloads. Identifiable genomic data is susceptible to data interceptions and leaks, and similar to banking, needs to be safely stored and encrypted. Finally, the scale of infrastructure and speed required to power highly-sensitive financial transactions match that of the needs of genomic data security – illustrating that a cross-pollination across different fields could improve the ultimate customer experience in either sector.
However, David noted that the genomics sector is unique in that researchers are highly driven to share data, to facilitate collaborations and prevent potentially life-saving data being lost in silos (i.e. where data is held and accessed by a single group or organisation). Scientists are thus turning to cloud-based and federated solutions to expedite collaborations, underscoring the need for advanced software architectures which balance data security with research capability.
Genomics is increasingly turning to Cloud Data Storage over On-Premise solutions
The scale and sensitivity of genomic data brings unique challenges with respect to storage, management, analysis and collaboration. Advanced cloud-based storage is becoming increasingly necessary over on-premise solutions to effectively manage and utilise large-scale clinico-genomic data.
From his vast experience across banking and software research technology, David draws an analogy to help us better understand cloud storage:
“Imagine storing valuables at home. We could rig an alarm system, fasten the doors and windows, lock up our belongings; but there are limits to what can be done to safeguard them. This is similar to storing genomics data on-premise, or genetic data on a USB drive.
But if we want better security, we take our valuables to the bank, and store them in a safety deposit box. Banks are equipped to offer more protection, better controls, and sophisticated technology that prevent security breaches. This is what cloud storage is for data, where access controls and security concerns are managed or regulated by the chosen cloud service.”
Cloud storage also accommodates large volumes of data, surpassing the scale of data storage that can be implemented on-premise. However, the risks of ‘transporting’ or sharing the data via the cloud are still identical to the challenges in transferring it from an on-premise storage.
Cloud storage encrypts data by default, both ‘at rest’ (i.e. where the data is securely stored) and ‘in transit’ (i.e. when data is being transported) – ensuring data privacy and safe transmission. Data can only be de-encrypted by authenticated staff, and additional layers of security are added by the security network imposing constraints on which specific users can access and view, read, or edit encrypted files.
Data Encryption, Data De-Identification and Airlocks
Data encryption is a security measure that essentially locks an entire data file, meaning it cannot be accessed unless given a password, or a de-encryption key. Once opened, the content can be viewed in its entirety unless certain patient-sensitive data are masked or de-identified.
De-identification, as David describes, “effectively masks certain data within the file”, preventing a file from being traced back to the file owner and masking any potentially identifiable information with a random number or string. Only data fields with pertinent information for computational data analysis can be accessed and used by the researcher. Lifebit CloudOS implements both encryption and de-identification to ensure patient data security and privacy.
Within both the Lifebit CloudOS platform and the Research Environment set up for Genomics England, a data ‘Airlock’ system is used as an additional security layer to manage any movement of sensitive data. As David explains, “You can run analyses, you can create results, but they’re typically locked in the platform”. Any movement of data into and out of the data platform goes through the Airlock system, whereby the data movement is subject to a review by an authorized team, “it is essentially an independent validation that the specific data is authorized to move”.
Lifebit’s Unique Role In Securing Data Analytics
Due to the sensitivity of participant data, the default is to encrypt the data and limit access only to those who are specifically granted it. However, the Lifebit CloudOS platform adds an additional layer of security – owing to its federated architecture, data is never transferred or moved from its client’s secure on-premise or cloud environment. The data is remotely accessed by approved researchers without the need to to move it around and undermine security during transit.
“We set up workspaces where data is made visible to consumers or researchers who can then run analysis and tools, but we don’t actually ingest that data,” says David. “We federate to that data. So a customer or client can connect their own data source to Lifebit CloudOS. We then read that data in order to do computations, but we don’t actually store the data.”
Thus, Lifebit’s clients retain full ownership over their data, ensuring multiple levels of protection so that access is only granted by the client to whom they wish to have visibility over that data. Computational resources and bioinformatics tools can be imported into the workspace to analyze data, and not breach any privacy laws as the data is never moved from the client’s original environment. Lifebit CloudOS thus becomes an additional security ‘wrapper’ within the cloud environment, without compromising on data privacy.
What is Lifebit Doing to Continue Evolving In the Genomics Space?
“The bigger the dataset, the more reliable and richer the outcome”, David highlighted that the challenge facing the genomics field now and in the future is the need to globalise, “We need to have data federated across countries…driven by innovation within data privacy.”
At the onset of the COVID-19 pandemic, health authorities from around the world scrambled to rapidly sequence viral strains, standardise patient clinical data and reporting on disease outcomes, leading to widely divergent tactics in coping with the disease. This highlighted the need for unified solutions for tackling disease and cataloging clinico-genomic data.
Lifebit CloudOS hosts a powerful suite of industry-standard and novel AI analytics tools, with its unique federated architecture these are deployed to wherever the data resides, secure data access across countries. This paves the way for more unified solutions that can be shared globally to drive evidence-based research and dynamic treatments tailored to specific geographical regions.
As scientific researchers look to leverage genomic data to deliver the latest advances in precision medicine, it is equally important to deploy stringent data security measures, making sure that patient data privacy is paramount and public trust in genomics research is maintained. Securely providing researchers with access to relevant patient clinical information at scale will be crucial in accelerating precision medicine.
If you have any questions about how Lifebit maintains data security in genomics or would like to learn more about secure data democratisation, get in touch with us here.
David Ardley is Director of Platforms at Lifebit. With his vast experience in the delivery of highly secure global IT infrastructure, David leads security, infrastructure development, design and implementation for Lifebit’s leading clients. When he is not solving complex security needs, David can be found playing the guitar, musing over blockchain applications, or competing at the world champs for the Great Britain triathlon team.