Latest News
EU AI Act Mandates Understandable AI for all High-Risk AI SystemsLeading Corporations Adopt New M&A Due Diligence Criteria to Assess Value, Risks of Data-, AI-Centric BusinessesDiveplane and Cantellus Group Announce Partnership to Promote Adoption of Understandable AI®
Skip to main content

With 2022 now behind us, we’d like to explore trends in healthcare in 2023 to predict the top use cases for the year. But first, let’s investigate the potential usage of synthetic data in healthcare.

Using data in healthcare inheritably poses its own risks and challenges as patient data is both sensitive and private information that is expected to be kept private by those who handle it. Introducing synthetic data in healthcare could mitigate these challenges as well as advance the use of AI in healthcare. Anonymized data allows sharing private data that is both sensitive and private. Using patient data that has been completely anonymized allows for the sharing of data to work with other institutions, create internal insights, and create research models and projects for the benefit of patients, amongst many other things.

So, before we dive in, what is synthetic data? The science behind Diveplane’s synthetic data is that it can replicate the statistical constituents of actual patient data and relationships between features of the real dataset – all without including any of the original sensitive data. In other words, you can clone or manipulate the statistical properties of the data’s population without cloning any individual in the dataset. The magic behind Diveplane’s synthetic data engine, GeminAI™, ensures that duplicate patient results aren’t emitted in synthetic data. This is one of the many benefits to synthetic data since it can be generated to avoid releasing any PII (Personal Identifiable Information) and ensures that the individual records are different enough so that they couldn’t be reidentified or matched by someone else having other data, such as payment dates and amounts. PII can be any personal identification that leads to identification of the individual in which the information belongs to. An example of this would be their full name, social security number, address, gender, birth date, etc. In addition to keeping any PII features safe, synthetic data also keeps PHI (Protected Health Information) secure, while still allowing research to progress on a “statistical twin” of the dataset. PHI can be any medical records that belong to patients that are all protected by HIPAA (Health Insurance Portability and Accountability Act) standards. As one example, HIPAA protects an individual’s health records from being improperly used or disclosed by hospitals, doctors, etc.

HIPPA and other data regulations can make it difficult to access and use patient data, especially across administrative and national boundaries even though using that data could lead to revolutionary discoveries.Being that synthetic data can be generated to not reveal private information of actual patients, such synthetic data allows for innovation on that data without releasing any identification of patients or their health records. Properly generated synthetic data is significantly more advantageous than simply masking your data as the privacy risks are considerably lower.

Breaching data privacy regulations comes at a significantly costly risk. According to the HIPAA Journal, healthcare is the costliest data breach among all sectors, with the average cost per data breach increasing between $2 million to $9 million per incident. To put things into perspective, consumer goods data breaches cost roughly around $1.93 million per incident. As healthcare associations face costly hindrances while they’re in search of effective data analysis, more and more healthcare organizations are searching for a less costly, alternative solution to be able to use and democratize their data.

In healthcare, using synthetic data delivers the hope of overcoming many data issues that healthcare professionals face. By utilizing a synthetic data tool, such as GeminAI™ healthcare organizations can ensure that their selected synthetic data tool is equipped with the required and proper levels of privacy for any audit, and that healthcare professionals can research high quality synthetic data and create unlimited use cases for various insights and sharing. Before healthcare companies can fully embrace their selected synthetic data tools and resources, they also must confirm that the synthetic data tool they select has met their own specific criteria for privacy as this is crucial in healthcare, which will then enable internal research of data sets, and are guided in a way that shows them how to create, use, and assess their own synthetic data in a way that safeguards thorough privacy laws and can pass any audit with flying colors. This leads us into our first suggestion:

1. Rare disease use cases

Collecting data for rare disease use cases is much harder since these cases are more limited and the data is not as varied. By using anonymized and synthetic data for rare disease use cases, researchers and data scientists can create data for underrepresented patient subgroups, such as rare diseases.

Reducing bias is what every data scientist aims to do, right? Well, algorithms tend to be biased toward certain and specific types of patients and cater less to patients that are underrepresented, such as patients with rare diseases. Therefore, we predict rare disease use cases will be amongst the top trending use cases in healthcare in 2023 as this data typically tends to be scarce. Researchers can create control groups for such rare diseases for critical clinical trials, mitigate bias, and utilize synthetic data for those who may have not been selected for other clinical trials which typically tend to cater more to patients that don’t fall in the “rare diseases” category.

2. Using synthetic data to improve machine learning models

Machine learning models are extremely helpful in improving dependability and reliability in healthcare. In certain instances, machine learning models can be trained with high-quality data in the size of large samples. Utilizing synthetic health data in large samples can be pivotal in assisting with training machine learning models and as a result, the algorithms will be able to produce new outcomes and assist with:

  1. Exploring rare diseases (as mentioned above)
  2. Improving the processes of discovering new vaccines
  3. Discovering new illnesses, diseases, pandemic health concerns (such as COVID)

3. Use cases where synthetic data can be used with very minimal/little training data

Clinical trials in healthcare can be difficult to use for data analytics as there may be small patient data samples and the applicability may be difficult to use. For example, if a clinical trial has a small sample size or getting individuals to participate is troublesome, it can be challenging to stay data driven in each clinical trial as well as facing underrepresentation in patients who agreed to do clinical trials. By using synthetic data for such use cases, this can bridge the gap to complete such data sets as well as increase the overall data availability. Using synthetic data to bridge this gap can allow healthcare organizations to conduct large data set analyses that will be able to lead to new findings with synthetic healthcare data.

There are many benefits of using synthetic data in healthcare. Various sectors that utilize synthetic data all benefit in different ways. Specifically, healthcare organizations can benefit from using synthetic data by:

  1. Creating a “statistical twin” of the original data set all while leaving PII’s and PHI’s out. This synthetic data is a safer method that gate keeps sensitive information to uphold the privacy and confidentiality requirements of HIPAA.
  2. With properly generated synthetic data, there are far fewer restrictions with using the synthetic data for secondary and processing usage.
  3. Once the synthetic data is produced, no patient consent required.
  4. Synthetic data properly generated from real patient data is high quality, therefore; can precisely advance the compliance or AI modeling and pattern identification.
  5. By introducing machine learning models, healthcare organizations can train such models with usage of synthetic data to adhere to specific conditions that didn’t occur in real data sets.

Synthetic data in healthcare in a nutshell: The best thing about properly synthetic data is that it does not contain any of the original person information. Because of this, companies will be able to navigate regulations of personal data. That means, companies will be able to navigate the new EU/AI Act and the U.S. Blueprint for an AI Bill of Rights recommendations. Properly generated synthetic data can be used for any kind of analysis you wish such as clinical trial investigation, medical research, or really any other kind of medical study. There are many types of ways to reidentify data, and so synthetic data should take advantage of the wide diverse sets of privacy and anonymity techniques, including differential privacy and others. With innovation stemming from utilizing synthetic healthcare data, healthcare organizations can finally modernize medical analyses and more cost-efficient, personalized medicine.