Scroll to the top
Blog_Complete Guide to TRE - large header (2)
Blog_Complete Guide to TRE_banner_mobile (2)

The Complete Guide to Trusted Research Environments in 2023

Lifebit

Lifebit

 

In this article:

 

  1. What is a Trusted Research Environment
  2. Why do we need Trusted Research Environments
  3. Defining the Key Features of a Trusted Research Environments
  4. Advantages of Trusted Research Environments
  5. How are Trusted Research Environments Being Used in Healthcare Data Management
  6. Challenges and Priorities for Trusted Research Environments in the Future

 

What is a Trusted Research Environment?


Trusted Research Environments (TREs) are highly secure and controlled computing environments that allow researchers to gain access to data in a safe way. Also known as “Data Safe Havens” or “Secure Data Environments”, these secure digital environments enable approved researchers to remotely access, store, and analyse sensitive data in a single location.
Designed to protect the privacy and security of sensitive data, trusted research environments have been supporting the secure sharing of sensitive data in the UK since 2013. TREs are used by a range of organisations and industries, including research institutions, universities, health systems, charities and government bodies. [1][2][3][4] These can be fully open-source (eg OpenSafely), in-house built, or built by commercial companies, with diverse benefits and features across these varied approaches.

TREs support the highest level of data governance by removing the need to share data physically among researchers and organisations.
Data instead remains in a secure environment and is analysed in situ by authorised researchers with tools available in the TRE.

With clear evidence that health, care and research and development sectors require deep, linked health-related data, trusted research environments are increasingly recognised as a solution that can provide secure access and analytics functionality to authorised researchers, while also increasing public trust in data use. As such, the trusted research environment landscape and associated technology are evolving rapidly in the UK and further afield.

Featured Resource

What is a Trusted Research Environment?

Vector (1)-1Vector (1)-1

 

TREs support the highest level of data governance by removing the need to share data physically among researchers and organisations

Vector (2)-1Vector (2)-1

 

Why do we need Trusted Research Environments?


Making use of large-scale health data

The opportunities for data-driven research and innovation today have never been larger. The availability of large-scale health data for research is immense. In the genomics field for example, there is now roughly 2 to 40 billion gigabytes of data generated each year. This health data holds huge potential to accelerate society’s understanding of how to detect, prevent, and treat disease. 

Studying larger sample datasets can lead to increased insights, as shown in numerous genetic association studies. For example, the first schizophrenia-associated variant was identified using a cohort of 3000 individuals, yet subsequent analysis of a cohort 10x larger uncovered over 100x the variants. [5]

 

Traditional data sharing models are not longer secure or scalable

However, the potential of health data is far from being realised. To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or difficult to access. [6] Agreements to enable data sharing between organisations are complex, and even where researchers are approved for access, it can typically take organisations six months or longer to make these approvals for data access. [7]

Traditional modes of data access and sharing rely on sensitive datasets being copied, moved, or downloaded into personal/organisational devices or centralised platforms. With the sensitive nature and sheer scale of health and genomic data, this mode of data access is inefficient or unsustainable.

Further, with an alarming rise in reports of large-scale data breaches and data mining activities, and a long-overdue shift in public awareness towards personal data sovereignty, maintaining public trust in health data research is critical. [8][9][10]

 

Trusted research environments are a scalable, long-term solution for health data access

TREs can address some of the concerns around data security and patient privacy - with multi-layered security controls and robust monitoring and auditing capabilities. Importantly, trusted research environments represent a shift in data access from a ‘lending library’ to a ‘reading library’ approach. In the TRE model, approved researchers can use the data within the library, but this information never leaves the library.

Further, trusted research environments provide the functionality and infrastructure to support the research on sensitive health data at scale. They are solving the problem of authorised data sharing by enabling research progress without sacrificing data security -  ensuring data are handled in a secure and responsible manner


Defining the Key Features of a Trusted Research Environments


In order to power research and progress therapeutic development while maintaining public trust, trusted research environments must strike the delicate balance between usability and security. As trusted research environments are built and procured across industries, there are several important features needed to ensure safe data access:

 

The Five Safes framework

A central feature of trusted research environments is recommended to be the Five Safes framework, originating from the UK’s Office for National Statistics, it consists of five pillars - safe people, safe projects, safe settings, safe data and safe outputs. The framework’s pillars span all stages of data management to make data available for research, while protecting confidentiality at all times. This set of principles is widely regarded as the gold standard for sensitive data protection.

A recent white paper from the UK Health Data Research Alliance, convened by Health Data Research UK (HDR UK), built upon this framework to establish guidelines and best practices for building trusted research environments, ensuring data services (like trusted research environment providers) provide safe access to data.[2] 

five-safes-framework

 

 

Beyond the 5 Safes, there are several key features and best practises of trusted research environments that are needed to enable researchers to safely and effectively access and analyse data - both in terms of safeguarding sensitive data and providing the analytics and infrastructure to support research at scale.

How data is safeguarded in a Trusted research environment

Custodians (e.g., biobanks and healthcare providers) of health data cohorts have been tasked with a critical role of safeguarding participants’ data. As part of an organisational-level data governance framework, trusted research environments need a multi-layered approach to safeguarding sensitive data, to ensure data are handled in a secure and responsible manner. Alongside ethical approval for data access that involves patients and the public in decision making, this governance framework can help to build public trust. 


Well-defined governance frameworks lay out the roles and responsibilities of different stakeholders, including researchers, institutional review boards, and information security teams, to ensure that patient data is handled responsibly. However, this can become increasingly complex, with data governance standards rapidly changing across regions and between institutions. Working with a trusted research environment provider can alleviate these complications. When choosing a provider, certifications in industry-recognised standards including ISO27001 and Cyber Essentials Plus signify that the provider is well equipped to manage private and sensitive data.

 

What measures can be taken to safeguard participant data within a Trusted research environment?

Encryption

Data encryption is the process of converting plain text (unecrypted) information into an unreadable ciphertext (encrypted) format, using an encryption algorithm and a secret key, with the purpose of maintaining the confidentiality and privacy of the information. The encrypted data can only be decrypted and read by someone with access to the correct decryption key.

Pseudonymisation

Data pseudonymisation is a privacy-enhancing technique that replaces identifiable information, such as personal names and addresses, with a pseudonym, or a fake name, that cannot be traced back to the original information without additional information. Pseudonymisation reduces the risk of a data breach and protects the privacy of individuals by making the data less easily linkable to specific individuals.

Role-based access controls

Role-based access control (RBAC) is a method of restricting access to a computer or network based on the roles of individual users within an organisation. In RBAC, users are assigned to specific roles, and each role is granted certain permissions, such as access to specific files or applications, or the ability to perform specific actions. The permissions are determined based on the responsibilities and duties associated with each role. This type of access control provides a flexible and scalable way of managing and organising user access.

Data export control (Airlock)

These are controls that stop data from being exported or downloaded to external environments, without first obtaining approval, an example of this is an ‘Airlock’. Genomics England has a world-renowned Airlock process which means only the results of an analysis can be exported by users, and authorised personnel must approve and validate the purpose of any data download from the TRE.

Monitoring, logging and auditing

Data activity monitoring is needed so that TRE owners have visibility of who is doing what with the data and for which purposes. TREs should have monitoring capabilities that track and audit analyses and datasets. The TRE must have systems in place that proactively monitor the security of their data in real-time, to identify suspected unauthorised data access, data leaks or anomalous activity and automatically alert the TRE owner.

Data access committee

Data Access Committees are a group of individuals whose responsibility is to review and assess data access requests.11 This can promote the benefits of sharing data while reducing potential harm from making data openly available without restrictions. Examples from biobanks include Genomics England’s Access Review Committee and the Our Future Health’s Access Board.

User authentication

TREs should have industry-standard user authentication in place to verify the identity of a user attempting to gain access. Some examples include Okta, OAuth and Active Directory.

Segregated Workspaces

To enforce restrictions on data access, the TRE should establish segregated workspaces that apply to users, projects, tools and data. Within workspaces, authorised users can only view the subset of data corresponding to their approved research project.

Data Minimisation

In line with the EU General Data Protection Regulation (GDPR), TREs should support data minimisation approaches. This means reducing the information shared about each patient to the minimum needed to conduct the analysis. As an example, a TRE may have one-way ingestion to create analysis-ready data, yet this data does not persist beyond purposes directly relevant to the research.

 

Future-proofing TRE capabilities

The vast majority of existing data management platforms are secure yet largely siloed, with limited ability to combine datasets and effectively pool research resources for analysis. [12][13] There are several key features trusted research environments must have to maximise research utility when working with large-scale data.


Scalability

Biobanks with hundreds of thousands of these datasets quickly scale to housing petabytes in volume - this creates challenges with cost, computational resources and storage. Cloud-based Trusted Research Environments can form part of the solution - with the “elastic” nature of cloud computing, TRE-owners only pay for the resources they need.


Integration

As data will be ingested into the TRE from a range of sources (e.g., electronic medical records and laboratory information management systems), TREs should be able to integrate with diverse sources and systems.

 

Federation

When integrating data from various sources, it is important to consider the risk and financial costs associated with physically moving data. Federation capabilities simplify the linking of disparate data sources without physically having to move the data itself. Within a federated architecture, data will remain within appropriate jurisdictional boundaries, while metadata is centralised and searchable. 


Automated data transformation

Health data comes from a wide range of sources. With this diversity comes wide variability in how data are described and stored, which creates challenges for researchers preparing data for analyses.

TREs need automated systems within the platform to efficiently convert raw data to standardised analysis-ready data. This includes established ETL (Extract, Transform, Load) pipelines and APIs for interfacing between TREs and the data source. FAIRifcation of data within the trusted research environment further makes data Findable, Accessible, Interoperable, and Reusable with the incorporation of unique identifiers for data and metadata management.

 

End-to-end solution

Once the data is in a usable format, trusted research environments should incorporate built-in analytics to transform the analysis-ready data into insights. Genomics England’s Trusted Research Environment includes integrated, open-source tools to enable researchers to analyse the data that is housed within the Trusted Research Environment.

TRE

Featured Resource:

Key Features of a Trusted Research Environment


Advantages of Trusted Research Environments

Health and multi-omics data are of high value for research, yet the scale and sensitivity of this data bring unique challenges for enabling secure data access. trusted research environments can solve many issues surrounding secure data access in healthcare settings. There are numerous benefits for researchers, organisations, and patients, compared to traditional methods where data is copied and moved.


Key advantages for using a trusted research environment for using a trusted research environment in health data research and management:

  1. Improve collaboration between organisations: TREs enable data access in a secure and controlled environment, supporting collaboration between researchers across different institutions or even countries. With increased access to a wider range of data, researchers can gain new insights and perspectives on the issues they are studying.

  2. Facilitate population-scale studies: Population-scale data is critical to understanding the drivers of disease and identifying patterns and trends in health and illness. TREs can be used to store and process large amounts of patient data, making it possible to conduct research on a much larger scale than would be possible with traditional methods.

  3. Improve clinical trial management: TREs can streamline the process of collecting, storing and sharing sensitive data from clinical trial participants, making it easier for researchers to access, analyse, and share data in a controlled and secure environment. This can lead to more accurate, reliable and efficient clinical trials.

  4. Improved patient outcomes: Better research and more accurate data can enable healthcare professionals to make more informed decisions and provide better patient care. Using a TRE, researchers can uncover new insights into the causes of diseases and develop more effective treatments.

  5. Improve data security: TREs allow approved researchers to securely conduct their work while keeping patient data safe from unauthorised access and potential security breaches. This is particularly important when working with sensitive information, such as genomic data, which can be used to identify individuals. Additionally, TREs provide increased oversight on what data is being used for.

  6. Compliance with regulations: The healthcare industry is heavily regulated, and organisations must comply with laws and guidelines to protect patient data, such as HIPAA, GDPR, and security standards like ISO 27001. A TRE supports organisations in meeting these requirements by providing the necessary controls and oversight to ensure compliance with regulations.

  7. Cost-effective way to provide secure data access: By consolidating data storage and analysis in a single environment, researchers can reduce the costs of maintaining multiple systems and performing data migrations. Additionally, a TRE can help approved users avoid costly data breaches and non-compliance penalties.

  8. Sustainability: In traditional methods of data sharing, data is copied and moved, which requires significant consumption of resources. Using a TRE minimises data duplication and eliminates transfers of files, reducing resource consumption.

 

Featured Resource:

8 Advantages of Using a Trusted Research Environment in Healthcare Research & Data Management

 

Vector (1)-1Vector (1)-1

 

To preserve patient privacy, much of the world’s health data is stored within institutional siloed environments that are unavailable to researchers or difficult to access

 Vector (2)-1Vector (2)-1

How are Trusted Research Environments Being Used in Healthcare and Research Today?

Across biobanking, governments and health providers, trusted research environments are being increasingly adopted as a means to achieve both data accessibility and security.
 
We highlight some case studies of how trusted research environments are being used across the life sciences industry:


National_Health_Service_(England)_logo_white 1

 

 

National Health Service England (NHS England)
Recently, NHS Digital, in partnership with Health Data Research UK, developed a TRE that provides academic researchers access to cardiovascular and cancer data for COVID-19 research. Published in the British Medical Journal, the partnership with national health data custodians provides linked, nationally collated electronic health records for approved research within secure, privacy-protecting environments. [14] 

By combining individual-level data across national healthcare settings, data on age, sex, and ethnicity are complete for around 95% of the population. This resource has already proven essential for accurate recording and thus research on cardiovascular disease, providing researchers across the UK with rapid access to data.

 

cropped-SAIL_Databank_Logo-On_Colour 1

 

 

Secure Anonymised Information Linkage (SAIL) Databank
A rich population databank, whose TRE provides global researchers secure remote access to datasets with anonymised health and social care data records for the population of Wales.1 In operation since 2007, the SAIL Databank operates on the UK Secure Research Platform, a private research cloud with customisable technology. 

Research publications resulting from the databank are in the hundreds - a recent example, in the largest study of its kind, found that COVID-19 vaccines offer effective protection against infection for high-risk healthcare workers. [15]

 

GEL_logo_RGB_Light 2

 

 

 

Genomics England
The UK government’s public sector research endeavour, Genomics England currently hosts the data from over 135,000 NHS patients within a TRE for approved research use. The TRE is a cloud-based tool (powered by AWS and Lifebit) that approved researchers can use to access the clinical and genomic data from participants with cancer, rare disease, and COVID-19. With separate data access processes distinguishing public from the private sector, researchers that want to access data must apply to become a member of either the Genomics England Clinical Interpretation Partnership (academics, students, and clinicians) or the Discovery Forum (industry partners). 

 


NGC_Logo_Negativ_RGB_ENG-2

 

 

Danish National Genome Center
A federated TRE deployed within the Danish National Genome Center’s supercomputing cluster will serve as the scalable and secure data management and analysis platform for Denmark’s national researchers, clinical scientists, and international collaborators. Powered by the Lifebit Platform, the TRE will deliver a next-generation computational infrastructure. The Danish National Genome Center and its collaborators will recruit and sequence whole genomes of 60,000 patients diagnosed with cancer, autoimmune disorders, and rare diseases by 2024.

 

Challenges and Priorities for Trusted Research Environments in the Future

 

Looking to the future, many governments, health systems, and biobanks see TREs as a secure long-term solution for research and clinical use of sensitive health data. 

This is most apparent in the UK, as set out in recent national policy guidance. In 2022, the UK government commissioned an independent review by Professor Ben Goldacre on the use of National Health Service (NHS) health data for research and analysis. This review, and others, have recommended that TREs, or ‘Secure Data Environments’, should be the default way to access health and social care data for R&D going forward.

Yet with a rapidly changing data, regulatory, legal, and technology landscape, TRE owners and suppliers must keep pace with developments to ensure TREs are sustainable into the future. We explore some key priorities and challenges for the future that relate to TREs for health data.


Trusted Research Environment accreditation policies

Countries are increasingly taking measures to protect and retain sovereignty over their national data, with strict national data protection laws and regulatory frameworks governing the movement of patient data limiting transfer between national jurisdictions. [16] 

In line with this, there is an increasing prevalence of accreditation schemes to audit and certify TREs - examples in the UK include the NHS Secure Data Environment and the Our Future Health Trusted Research Environment accreditation processes. The processes will review trusted research environment owners and suppliers to ensure trusted research environments meet the necessary standards across information governance, cyber security, operational, privacy, and technical requirements.  

Implementing accreditation frameworks and regulatory bodies that regulate the use of data can support a safer trusted research environment ecosystem, help foster trust from the broader public, and ensure that the best interests of the public and patients are protected.


Keeping public involvement at the forefront

Conducting meaningful Patient and Public Involvement and Engagement (PPIE) in the design and use of trusted research environments is becoming a best practice to minimise the risks of data misuse and focus research on studies where there is a demonstrable public benefit.
 
There are widespread examples demonstrating how patient and public involvement in decision-making on trusted research environments can lead to improved research output. Maintaining transparency on trusted research environment design and governance procedures is vital to ensure that public trust is maintained to allow long-term success and growth of population health initiatives that will ultimately save lives.

 

Technologies of the future

Amongst the widespread push for greater data protection and patient privacy, there is a need to factor in the knock-on effects for the flow of data access in research. This is where innovative technologies and approaches can bridge this gap and create trusted research environments that are sustainable into the longer term:

  • Federation is widely regarded as a key technology enabler for linking up disparate datasets, including data stored in TREs. [17] Federation across TREs means data can be virtually linked for combined analysis whilst remaining at its source. This means researchers can easily access, collaborate, and analyse disparate datasets without data movement.

  • No code/low code tools are part of a wider industry shift towards software that supports a wider range of end-users. As the majority of the TREs in use today are in a research context, transitioning this to use in clinical, health systems, and the private sector will take a significant step-change in terms of usability across more diverse end-users.

  • Cloud computing with enterprise infrastructure providers like AWS and Microsoft Azure can provide state-of-the-art capabilities in security and storage, but also support the increasing scale of multi-omics and clinical datasets available today. The ‘elastic’ nature of cloud computing means researchers only pay for what they need.

 

 

Conclusion


With the ability to scale with increasing volumes of data, ensure data privacy and protection, and enable secure access for approved research, trusted research environments can serve all ends of the health research community. Enabling valuable research at scale to improve the lives of patients, trusted research environments represent a sustainable and secure long-term solution for managing and using big data.


Editor’s note: This post was originally published on March 28, 2023 and may be occasionally updated for accuracy and comprehensiveness.

 

 

Further reading

Read Lifebit’s white paper on best practices for building a Trusted Research Environment
Read Lifebit’s white paper on security and data governance

 

References

1. Lyons, R. A. et al. The SAIL databank: linking multiple health and social care datasets. BMC Med. Inform. Decis. Mak. 9, 3 (2009).

2. UK Health Data Research Alliance & NHSX. Building Trusted Research Environments - Principles and Best Practices; Towards TRE ecosystems. https://zenodo.org/record/5767586 (2021) doi:10.5281/ZENODO.5767586.

3. Nik-Zainal, P. S. et al. Multi-party trusted research environment federation: Establishing infrastructure for secure analysis across different clinical-genomic datasets. https://zenodo.org/record/7085536 (2022) doi:10.5281/ZENODO.7085536

4. Trusted Research Environment service for England. NHS Digital (2022).

5. Visscher, P. M. et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 101, 5–22 (2017).

6. 4 ways data is improving healthcare. World Economic Forum (2019).

7. Learned, K. et al. Barriers to accessing public cancer genomic data. Sci. Data 6, 98 (2019).

8. Kilzi, Michel. The Anatomy Of Personal Data Sovereignty. Forbes (2021).

9. Thousands of patients hit by NHS data breaches. Independent https://www.independent.co.uk/news/health/data-nhs-patient-breaches-privacy-b1877154.html (2021).

10. Google reportedly mining millions of Americans personal health data. CBS News https://www.cbsnews.com/news/google-mining-millions-of-americans-personal-health-data-report-says/ (19AD).

11. Cheah, P. Y. & Piasecki, J. Data Access Committees. BMC Med. Ethics 21, 12 (2020).

12. Denton, N. et al. Data silos are undermining drug development and failing rare disease patients. Orphanet J. Rare Dis. 16, 161 (2021).

13. Koutkias, V. From Data Silos to Standardized, Linked, and FAIR Data for Pharmacovigilance: Current Advances and Challenges with Observational Healthcare Data. Drug Saf. 42, 583–586 (2019).

14. Wood, A. et al. Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource. BMJ n826 (2021) doi:10.1136/bmj.n826.

15. Bedston, S. et al. COVID-19 vaccine uptake, effectiveness, and waning in 82,959 health care workers: A national prospective cohort study in Wales. Vaccine 40, 1180–1189 (2022).

16. Mitchell, C., Ordish, J., Johnson, E., Brigden, T. & Hall, A. The GDPR and genomic data. (2020).

17. Thorogood, A. et al. International federation of genomic medicine databases using GA4GH standards. Cell Genomics 1, 100032 (2021).

↑ Top