3 Key Strategies for Effective Data Harmonization Webinar Report

4 minute read
Hannah Gaimster, PhD

Hannah Gaimster, PhD

Blog_3 strategies for data harmonization_report_HeaderBanner

April 2024

Author: Hannah Gaimster, PhD

Contributors: Amanda White 




3 Key Strategies for Effective Data Harmonization was the first webinar in Lifebit’s strategies for precision medicine series, held on 26th March 2024. 

The potential research and innovation opportunities with health data are unparalleled. However, The World Economic Forum estimates that 97% of healthcare data is underused, indicating that the revolutionary potential of data is still far from being achieved. 

The webinar focused on methods for harmonizing and curating data that are essential for increasing the usability of health data to allow researchers to more quickly and effectively analyze global data. Experts from industry and academia joined the webinar to discuss their valuable perspectives and experiences of data standardization, common data models (CDMs) and real world data (RWD).

Catch up on the webinar here


Lucia Groizard, Flatiron Health, The role of common data models (CDMs) - opportunities and limitations

Lucia Groizard, Flatiron Health highlighted the role of common data models (CDMs) and their opportunities and limitations.


The webinar kicked off with an insightful talk from Lucia Groizard, Senior Product Manager, Flatiron Health. At Flatiron Health, Lucia leads the development of data solutions to expand the impact of real world evidence (RWE) in global cancer care and research. Lucia’s talk, entitled ‘The role of common data models (CDMs) - opportunities and limitations’ began by considering what properties harmonized health data must have in order to be valuable- catch up on the clip below which summarizes this.


Lucia then highlighted some of the key challenges around data harmonization- demonstrating that even within the healthcare domain, data can be captured in vastly different ways. She also discussed the ‘paradox of choice’ around health data terminologies, showing examples of the huge variety of data harmonization approaches researchers and organizations can take.

Luica emphasized this further by showing how different terminologies have different strengths and granularity. She introduced an analogy of how some words in one language do not directly translate- this can also be the case when converting data to CDMs. She concluded by introducing the audience to Flatiron's approach to CDMs and how they have been evolved over time to serve the needs of their users and considered some of the key advantages that data standardization can bring to users and patients alike.



Dr Melissa Haendel, PhD, University of Colorado, Unifying and standardizing the use of health data across organizations

Dr Melissa Haendel, PhD considered unifying and standardizing the use of health data across organizations.


The webinar then moved onto an expert in data harmonization in academia, Dr Melissa Haendel, PhD, Chief Research Informatics Officer and Marsico Chair of Data Science at University of Colorado. At the University of Colorado, Melissa is responsible for the use of information and information systems to accelerate biomedical discoveries, streamline health system operations, and continuously improve patient care.

Mellisa’s talk, ‘Unifying and standardizing the use of health data across organizations’ began by highlighting key challenges across the US healthcare sector. As there is no centralized healthcare provider in the US, this means there is a lack of centralized healthcare data and that data from a single patient is often spread across multiple providers, time zones and geographies.  

Melissa discussed a key case study on harmonizing rare disease data that her research is focused on. This work has revealed clear differences and patterns in patient diagnostic journeys across healthcare systems in the US. Discover more details in the full paper linked below.



She highlighted that when data is standardized and interoperable, it is amenable to federated data analysis techniques- which has a strong advantage in the US where healthcare records are fragmented. Watch Mellisa further discuss the key advantages of using a federated approach in the clip below.





Melissa emphasized the complex public, private and government partnerships that were forged to enable the formation of the National COVID Cohort Collaborative (N3C). In order to enable academics and physicians inside the N3C Data Enclave to analyse COVID-19 data and possible treatments as the pandemic developed, data from +60 healthcare facilities across the US was harmonized into a single format. The N3C Data Enclave is a safe, cloud-based research environment that protects N3C's data and offers a robust analytics platform. 

Mellisa provided great insights into how the N3C is enable to ensure data harmonization is performed at scale in an agile way, with continuous monitoring and quality assurance. The NC3 utilizes the ‘Observational Medical Outcomes Partnership’ (OMOP) CDM as the target model for all data, but Melissa underscored the importance of understanding that different CDMs will have different ways to express the same concept, much as Lucia did in the previous talk. Through NC3’s ability to harmonize and combine large datasets it was able to provide the earliest and most representative data to predict long COVID risk and also to help inform health policy.

She concluded by summarizing the crucial need for collaborative team science to power data harmonization efforts, analyze global datasets and ultimately improve patient outcomes.




Dr Sandeep Pawar, PhD, Verana Health, Data standardization in research, drug discovery and clinical trials.


Dr Sandeep Pawar, PhD, Head of Ecosystem Partnerships, Verana Health discussed data standardization in research, drug discovery and clinical trials.

The final presentation of the day was given by Dr Sandeep Pawar, PhD, Head of Ecosystem Partnerships, Verana Health. With deep expertise in biopharma R&D, RWD and data analytics, Sandeep leads Verana Health's partnerships for  RWD and data products.

Sandeep’s talk, ‘Data standardization in research, drug discovery and clinical trials’ took a different angle and focused on key pharma use cases for harmonized data, rather than the technical or research based aspects covered previously by Lucia and Melissa respectively.  He emphasized the concept that data should be both useful and usable, and that data standardization can help with both these needs. Watch his summary below:

Sandeep commented that in order for pharma companies to get the most out of (RWD) for use in clinical research and trials, it must first be standardized and interoperable. Furthermore, in order for relevant regulatory bodies to approve use of RWD, data standardization approaches will be increasingly required here. 

Sandeep concluded by speculating on the potentially transformative role that AI can play in aiding development of data standardization approaches. Understand how this might be achieved in the future by watching the short clip below.



Overall, the webinar underscored the importance of collaborative efforts, standardized data models, and interoperability in harnessing the full potential of health data for improving patient outcomes and advancing medical research and innovation.

Don’t miss out on our next webinar: 3 strategies for RWD challenges in clinical research and trials

Sign up for our newsletter to ensure you hear about the next webinar
in our series!

Watch the webinar here >>



About Lifebit


At Lifebit, we develop secure federated data analysis solutions for clients including Genomics England, NIHR Cambridge Biomedical Research Centre, Danish National Genome Centre and Boehringer Ingelheim to help researchers turn data into discoveries.


Interested in learning more about Lifebit’s federated data solution for genomics research?

Contact us  Request a demo