How is federated data analysis boosting genomics research?
Hannah Gaimster, PhD
Genomics sequencing projects are taking place in countries around the world to enable population-level genomic medicine. Genomics is the study of the complete set of DNA in a person or other organism. With DNA underpinning a large proportion of an individual's health and disease status, genomics is beginning to gain traction in clinical settings, as part of a personalised medicine approach.
Combining information on a patient’s clinical outcomes—which examine observable changes in health and wellbeing—with their genomic data can help us better understand how their genome affects their disease risk. Increasingly, breakthroughs in diagnostics, drug development, and targeted therapies are being made possible by advances in our understanding of the genome.
For example, the field of rare diseases can particularly benefit from genomics research. Understanding the genomic data of patients affected by rare diseases can help uncover new diagnoses and allow more targeted clinical care to be given, ultimately improving patient outcomes.
However, researchers can struggle to access and analyse the relevant genomic data to power their research. There are three reasons for this:
- Data security and patient privacy are at risk when data is moved. Strict national regulatory frameworks (such as General Data Protection Regulation, GDPR) that differ from country to country are making it near impossible to collaborate across borders using a traditional model of data sharing.
- There is now roughly 2 to 40 billion gigabytes of data generated each year in the genomics field, making data duplication or movement inefficient, expensive and difficult.
- Even if researchers can access disparate datasets, these may not be in the correct format to enable easy collaboration or provided on an easy to use, low code platform.
Consequently, large-scale genomic data migration has become unfeasible. A breakthrough technology that is increasingly used to provide researchers with secure data access is a data federation approach.
Data federation can enhance genomics research
The video below demonstrates how federated data analysis functions. Historically, genomic data access has typically required researchers to access and analyse data by downloading it from disparate sources and analysing it together within a centralized location (steps 1 and 2). Federated analysis (step 3) allows the distributed genomic data from multiple sources to be analysed in parallel, saving the researcher time and money, while also keeping the sensitive data secure.
There are four important requirements for an organisation or researcher to be able to perform data federation. These are:
- appropriate computing infrastructure,
- authentication and analytics technology,
- standardised and interoperable data,
- and robust security measures.
A federated approach to genomic data analysis allows researchers and clinicians to combine global cohorts of genomics data, to maximise new scientific discoveries that can be made when this data is securely combined. This article discusses key examples where data federation is enabling genomics research worldwide.
Federated data analysis and the COVID-19 pandemic
In the UK, genomic medicine efforts have been spearheaded by Genomics England and its 100,000 Genomes Project, one of the largest cohorts of rare disease and cancer patients globally.
During the pandemic, Genomics England worked with the National Health Service in England to deliver whole genome sequencing of up to 20,000 COVID-19 intensive care patients, and up to 15,000 people with mild symptoms.
This allowed researchers to query, analyse and collaborate over these very large sets of genomic and medical data in real-time. The enhanced functionality and automated tools helped researchers understand the underlying genetic factors that may explain what makes some patients more susceptible to the virus, or more severely ill when infected.
Multi-party data federation can increase global collaboration
Recently, multi-party federation was successfully demonstrated between trusted research environments (TREs) for the first time in the UK, linking the TREs of the University of Cambridge and Genomics England.
A consortium was formed between Lifebit and its partners as part of the Data and Analytics Research Environments UK (DARE UK) programme, which is funded by UK Research & Innovation and delivered in partnership with Health Data Research UK (HDR UK) and ADR UK (Administrative Data Research UK).
The consortium set out to build a federated ‘virtual’ link between the TREs of NIHR Cambridge BRC and Genomics England. With multi-party federation, distributed genomic data sources could be accessed and utilised simultaneously without having to physically move the data.
Professor Serena Nik-Zainal, NIHR Research Professor and Honorary Consultant in Clinical Genetics, University of Cambridge said:
“This technology has the potential to remove the geographical, logistical, and financial barriers associated with moving exceptionally large datasets. For genomics research, the potential to undertake research across multiple datasets means access to much greater and more diverse data. Applied at scale, this means huge potential for new discoveries, particularly for research into rare diseases and for reducing health inequalities.”
By enabling rapid access to data and secure data sharing, all at reduced costs, these impactful efforts are changing the nature of research collaboration globally for the better. Furthermore, this reduced the current time burden on researchers to conduct their analysis over integrated cohorts, maximising the time they can spend generating new insights and accelerating novel discoveries.
Federated data analysis in newborn genomic screening
One example of where this is being introduced, and federated data analysis is helping to link global cohorts of data for secure analysis is in Greece.
Here, PlumCare RWE and Lifebit have begun a partnership to support Greece’s pioneering national newborn genomic sequencing program, First Steps.
Researchers will be able to access and analyse data securely in combination with global cohorts, whilst ensuring data is kept safe, private and in place in their secure environment using Lifebit’s federated technology.
This will help detect approximately 400 early onset but actionable genetic conditions in newborns, ensuring appropriate treatments can be given as early as possible, to help limit the impact of the diseases.
In conclusion, data federation can aid researchers, organisations and governments in securely accessing and analysing genomic data in a variety of ways. Federated data analysis is especially valuable in genomics research as it circumvents issues surrounding data privacy and security and the movement of large data sets. Data federation can give researchers secure access to genomic data sets globally, enabling them to run analyses, find answers to pressing research issues, and accelerate scientific discoveries.
Author: Hannah Gaimster, PhD
Contributors: Hadley E. Sheppard, PhD and Amanda White
At Lifebit, we develop secure federated data analysis solutions for clients including Genomics England, NIHR Cambridge Biomedical Research Centre, Danish National Genome Centre and Boehringer Ingelheim to help researchers turn data into discoveries.
Interested in learning more about Lifebit’s federated data solution for genomics research?