Closing the gap in health data diversity to benefit patients and produce global insights

3 minute read
Chiara Banas, PhD

Chiara Banas, PhD

September 2023


Introduction to Data Diversity


In 2003, the Human Genome Project identified that 99.9% of human DNA is identical between individuals. This enabled researchers to utilise a reference human genome, which helped build knowledge and understanding of human diseases. This led to the start of precision or personalised medicine.  However, in the 0.1% variation between individuals lies the explanation of how one person can be more susceptible to a certain disease than another and also how one person can respond differently to a specific drug or treatment.



Therefore, it is important that we understand the genetic variants between individuals to offer more accurate insights and create greater equity in access to therapies and treatments for everyone. This article considers opportunities to close the gaps in health data diversity to benefit patients everywhere, with a specific focus on the role that health data standardisation can play.


The problem: There is a lack of diversity in health data


Since the sequencing of the first human genome, the majority of genetic association with disease studies have been performed in people with European ancestry. This presents the problem where these datasets are only representative of one group of people and are lacking representation of other populations. Only having data from one population group in research studies can inhibit building correct models and forming complete insights.

This then has the potential to lead to biased conclusions and may impact data-driven decision making and processes. Furthermore, having incomplete data to study can help drive and perpetuate biased beliefs and health inequality. This has many negative impacts and can result in:

  • disparities in healthcare
  • misdiagnoses
  • inadequate treatments


The good news: Data is becoming more diverse


The good news is that experts, organisations, companies and research groups around the world are driving change to champion diverse and inclusive health data for research. So much so, that funding bodies are making this a requirement in research.

Biotech companies, such as Gen-t in Brazil and in Mexico, are aiming to sequence the Latin American population, which is a historically underrepresented group in genomic and health studies.



Tackling this issue to close the gap in data diversity is needed especially as artificial intelligence develops. An example is training AI models to better detect skin cancer using diverse skin colours, instead of lighter skin tones. This will ensure a more complete dataset to lead to accurate and unbiased medical insights.


How do we tackle closing the gap in health data diversity?


The answer - health data transformation, data interoperability and secure data access


  • Solution - gather more data:

The first hurdle in closing the gap in health data diversity is to include diverse populations in studies and gather data from these different groups. This is to first identify where data is lacking and then focus recruitment studies on these population groups that are underrepresented. For example, Brazil has one of the most diverse populations in the world, but remains massively underrepresented in genomic studies.

A notable issue within this solution lies in obtaining secure funding from governmental bodies and research organisations to be able to study underrepresented groups.


  • Solution - standardisation of health data for interoperability:

Standardising health data helps to establish consistent formats and allows for interoperability with other existing data. Having standardised data allows researchers to compare and analyse data from diverse sources and populations.

Data can be standardised to common data models, for example the Observational Medical Outcomes Partnership (OMOP) for standardising health data. Standardised data allows for interoperability as it adopts consistent terminology and protocols for collecting, storing and sharing information.

Having data standardised to a common data model enables researchers to maximise the insights gained from these data and identify trends and disparities in genomic data.


  • Solution - maximising secure data access globally:

One way to help tackle the lack of diverse data is to provide secure data access globally via a federated analysis approach and use of trusted research environments. This approach enables the safe and secure access, linkage and use of distributed data without it needing to be moved.


The benefits of closing the gap in health data diversity


  • Improving diagnostics - having a more complete picture of health data increases accuracy of diagnoses.
  • Improving treatment - treatments can be made more accurate and tailored for specific populations and even to the individual level in precision medicine.
  • Inform public health policies - decision makers can be better informed to create policies that cater to the needs of diverse and broader populations.  
  • Global collaboration - fosters innovation to collaborate and share knowledge with other groups and organisations globally.
  • Rare diseases - benefit patients with rare diseases by use of precision medicine or orphan drugs (repurposing drugs).


having diverse patient data will lead to better diagnostics, treatments, public health policies, precision medicine development, orphan drugs identification for rare diseases, and fosters global collaboration.




A lack in health data diversity can lead to negative outcomes such as disparities in healthcare, misdiagnoses and inadequate treatments. Tackling this and aiming to close this gap will benefit patients everywhere and lead to more accurate insights. This can be achieved through inclusion of more diverse data, standardisation of data for interoperability, and maximising secure data access globally.

Look out for the next blog in our series, where we will describe health data transformation.


Author: Chiara Banas, PhD

Contributors: Hannah Gaimster, PhD and Amanda White



About Lifebit


Lifebit provides health data standardisation services for clients, including Genomics England, Boehringer Ingelheim, Flatiron Health and more, to help researchers transform data into discoveries.

Lifebit’s services are making health data usable quickly.  


Contact us  Request a demo