Seven benefits health data standardisation brings to researchers, clinicians and patients
Hannah Gaimster, PhD
Introduction to health data standardisation
In science and healthcare, the amount of data needed to answer important questions keeps expanding. Large health databases are currently being created due to new technology. These technologies include better value genome sequencing, digitising medical tools, and expanding electronic health records (EHRs).
These vast datasets can provide important insights and eventually enhance lives. Recent groundbreaking studies that illustrate the power of big data in health research include
- the 100,000 Genomes study on rare diseases.
- research reporting the host characteristics triggering severe COVID-19 on approximately 60,000 participants.
- research confirming that high blood pressure is a risk factor for dementia- here, the National Institutes of Health (NIH) All of Us database of EHRs on +125,000 participants was utilised.
This article outlines the value of applying consistent formats and models to create standardised datasets that can be accessed and used for research and innovation in healthcare and research.
"Whilst the amount of data available for research is growing, the majority of users of health data (64%) lack the knowledge necessary to standardise data quickly. This results in researchers spending too much time preparing the data for analysis."
Why standardise data? Health data lacks consistency across sources
Health information can be securely accessed for research and innovation through biobanks, clinical trials, registries, internal studies, and EHRs, among other places. As a result, the data can have a great deal of variation regarding format and content. For instance:
- Health information is provided in various formats, such as free text (such as clinician notes) and CSV or JSON files.
- The same information can be described using several phrases across different datasets - both between organisations and geographical location. For instance, across several datasets, dates may take various formats ranging from YYYY-MM-DD, DD-MM-YYYY and MM-DD-YYYY.
- Different medical vocabularies to describe symptoms and disease, such ICD10 and SNOMED, can be used between datasets.
Whilst the amount of data available for research is growing, the majority of users of health data (64%) lack the knowledge necessary to standardise data quickly. This results in researchers spending too much time preparing the data for analysis.
Furthermore, according to some estimates, data scientists devote 80% of their work to organising and cleaning data.
Data must be transformed into interoperable formats to solve these health data analysis problems. This process is known as data standardisation.
The solution: health data transformation
Common Data Models (CDMs) are being increasingly utilised in the healthcare sector to overcome the lack of consistency in health data. Examples of clinical CDMs are the Observational Medical Outcomes Partnership (OMOP) CDM and Clinical Data Interchange Standards Consortium (CDISC).
Data can be efficiently merged when standardised to these CDMs, making it more useful than the sum of its components alone. The standard approach to data transformation provided by CDMs makes it possible to share research tools and data throughout different nations, sources, and systems.
Combining and assessing information is much simpler if all health data are organised following a single and consistent standard.
Health data standardisation benefits researchers, clinicians and patients
It is clear that a lack of or limited health data standardisation stalls research progress. The table details some key benefits of performing health data transformation techniques.
- Simplify data management
By using a single CDM for their data, researchers can achieve data harmony. This makes storing the data uniformly in databases easier, freeing up researchers’ time to analyse data and answer their research questions.
- FAIR-ification of data
Health data transformation ensures data aligns to the FAIR standards- data becomes Findable, Accessible, Interoperable and Reusable to maximise research efficiency, translational research and accelerate precision medicine.
- Enhance data quality, consistency and reproducibility
By using consistent data standards, data integrity issues can be avoided, and data quality is ensured. When health data is standardised, it is much easier to detect errors and ensure accuracy- enabling researchers to access accurate and reliable information for their analyses.
- Enable data linkage
To enhance our understanding of a person's well-being, it is possible to connect medical information with other types of data, such as lifestyle, environmental, or social data. This combination, known as data linkage, allows researchers to comprehensively understand the factors that impact health and disease. However, the ability of researchers to gain new insights from data is limited if it cannot be effectively linked and combined to increase its statistical strength. Common data models are, therefore, crucial to ensuring data is interoperable and can be linked accordingly.
- Increased collaboration
With standardised health data, it is easier to exchange and use data in collaboration with other researchers and clinicians. By transforming data, researchers can ensure they meet the highest global standards and are interoperable with other cohorts and organisations. To fully facilitate global collaboration, researchers are increasingly turning to trusted research environments and federated analysis solutions to access and analyse standardised datasets securely.
- Gain novel insights faster
When data is standardised, researchers can compare and analyse it in parallel more easily to make insights that they can use to improve their knowledge. Furthermore, it helps them avoid drawing conclusions based on inaccurate or incomplete data. It also provides the benefit of streamlined and improved analytics, enabling researchers to transform data into discoveries.
- Ultimately improve patient outcomes
The combined benefits described above all lead to health data being truly interoperable, shareable and leverageable. By combining data for analysis, researchers can power their research statistically, enabling quicker clinical applications and improved patient outcomes.
There are many different sources and formats of health information. Only when the data is made interoperable can it be effectively combined to produce new insights. It is essential to standardise health datasets to ensure data quality and accelerate collaboration for maximum insights and discoveries.
Look out for the next blog in our series, where we will describe the technical challenges researchers and clinicians can face when standardising health data and some of the solutions currently being developed.
Author: Hannah Gaimster, PhD
Contributors: Hadley E. Sheppard, PhD and Amanda White
Lifebit’s services are making health data usable quickly.
Interested in learning more about Lifebit’s health data standardisation services and how we accelerate research insights for academia, healthcare and pharmaceutical companies worldwide?
Find out more about the value of data standardisation at our upcoming webinar, Data Harmony, on 14 September 2023. Secure your place today.