How do end to end federated data platforms democratise data access and usability?
Hadley Sheppard, PhD
27th June 2023
In the last twenty years, there has been an explosion in the production of patient-derived biomedical data. This includes datasets derived from clinical-genomic, Electronic Health Records (EHRs), and real-world data (RWD) sources, which, when utilised together, can hold the answers to the underlying causes of disease.
Unfortunately, the transformative potential of this health data has yet to be realised. To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or are difficult to access, link and analyse. Even when researchers can access this data, they are not always equipped with the resources and tools to derive meaningful insights from that data.
To support research and innovation through the power of data, solutions are needed to enable data access, linkage, and analysis while maintaining security.
Data federation has emerged as a solution to increase the useability of sensitive biomedical health data for diagnosing and treating disease. Through data federation, researchers can be virtually linked to datasets of interest that are safely housed in highly secure computing environments known as Trusted Research Environments (TREs). The data is never physically moved or copied and the data controller or custodian maintains control.
This article describes how data federation technologies are used to develop end-to-end federated data platforms, which ultimately help democratise the access to and useability of data. By developing ways to fairly distribute secure access to data, tools, and knowledge, the scientific and health communities will accelerate global collaborations and therapeutic findings that will ultimately benefit patients.
What is an end-to-end federated data platform?
In its simplest terms, data federation is a software process that allows multiple databases function as one. Using this technology is highly relevant for storing sensitive biomedical health data, as the data remains within appropriate jurisdictional boundaries, while metadata (information that describes the data) is centralised and searchable (an alternative to a model in which data is moved or duplicated then centrally housed).
Federated architectures of individual organisations may be connected together into a federated data platform, enabling data access and computation for users across organisations. A prominent example of efforts towards federated data platforms include the UK’s National Health Service’s efforts to securely connect UK health data for approved research use.
Federated data platforms are indeed democratising data access and providing a means for approved users to securely query data irrespective of their physical proximity to where that data resides, but it is only one step in enabling the greater research communities. Incorporating federated analysis into the platform equips researchers and clinicians to derive meaningful insights from that data.
In an example of a genomic medicine end-to-end federated platform, genomic or phenotypic clinical data is first collected and transformed into interoperable formats. Next, these data will be ingested into the federated architecture, which allows authorised users to access and combine this data with other disparate sources to perform federated queries and build unique and valuable cohorts. The researchers can use analytical tools and pipelines built into the platform, and strict security measures will govern results export enabling therapeutic progress, discovery and informed clinical decisions.
Image obtained from the following article: https://www.frontiersin.org/articles/10.3389/fgene.2022.1045450/full
Beyond FAIR: How can low/no-code tools contribute to data democratisation?
Following FAIR principles - ensuring datasets are Findable, Accessible, Interoperable and Reusable - is an essential step to promoting the democratisation of data and data quality, but does not entirely address all issues and challenges associated with deriving insight. In particular, researchers and clinicians without a data science background may be at a disadvantage to using analytical tools that require coding.
Interestingly, the software industry is currently shifting towards “no/low-code” tools to support a wider range of end users with and without a data science background, thus enabling full democratisation of access to genomic data and the insights derived. Examples of low/no-code resources include the following:
- The Galaxy Community: An initiative within ELIXIR, a federated data infrastructure that brings together life science data sources across europe, this research forum offers a web-based platform to facilitate computational research for a variety of “omics” types and is specifically targeted to users without programming experience.
- The Dependency Map (DepMap) Portal: With a mission of mapping the landscape of cancer vulnerabilities across all cancers, DepMap offers an easy-to-use graphical user interface to explore cancer vulnerabilities from available chemical and genetic perturbation data using analytical and visualisation tools.
- The National Institute of Health’s (NIH) Common Fund Data Ecosystem (CFDE) Search Portal: The CFDE is a comprehensive resource for datasets generated through NIH funding, with the ultimate goal to make data more usable and useful for researchers and clinicians. There are interactive search functions and visualisations of gene-specific, compound-specific, and disease-specific data and more to empower researchers with and without a data science background.
If tools such as those described above could consistently be implemented within federated data platforms, researchers of diverse backgrounds could spend more time on what matters most - accessing global cohorts securely to progress therapeutic discovery.
Important considerations when democratising data access
A core benefit of federated data platforms is that they can democratise access to health data in a secure manner. While this brings huge potential for advancing medical research, there must be strict regulations over how data is governed and accessed that are applied at the organisational and researcher-level, in order to engender public and participant trust.
In line with the surge in data regulations arising across global jurisdictions, there is an increasing prevalence of accreditation schemes to audit and certify the “owner” of data management platforms. To guarantee ethical and secure usage of federated platforms, the safety and governance of these infrastructures must be regularly reviewed and measured against all aspects relevant to data security and governance, from implementing industry-recognised data protection frameworks, standards and information security measures to compliance with local data regulations and commitments to interoperability.
Access to the data within these federated platforms must be appropriately reviewed and governed by the data controllers - implementing such governance and regulatory bodies that regulate the use of data can help foster public trust in federated research and ensure data use is in the interest of both the public and participants.
Federated data platforms are emerging as essential entities globally that can scale with increasing volumes of data and ensure its protection, all the while enabling secure access for approved research. This ultimately democratises data access and creates widespread benefits sharing, regardless. However, end-to-end platforms take this one step further by providing researchers with the analytical tools they need to derive insight from biomedical data, regardless of background.
Moving forward, it will be interesting to understand the wider implementation of accreditation frameworks and bodies that regulate how end-to-end federated data platforms are governed in order to ensure the best practices with this data and that the interests of the public and patients are protected.
Lifebit works proactively with clients, including Genomics England, the Danish National Genome Centre, Boehringer Ingelheim, NIHR Cambridge Biomedical Research Centre, and others to comply with sensitive data requirements and establish their end-to-end federated solutions. We ensure that organisations can meet and exceed industry standards amidst the changing regulatory and regional landscape - enabling valuable research at scale to improve patients’ lives.
Authors: Hadley E. Sheppard, PhD
Contributors: Hannah Gaimster, PhD and Amanda White
To find out more:
Read Lifebit’s whitepaper on best practices for building a Trusted Research Environment
Read Lifebit’s whitepaper on security and data governance
Read Lifebit’s whitepaper on data standardisation