By Dov Lieber, Wall Street Journal | April 6, 2021
Researchers in Israel were happy to get their hands on data about thousands of Covid-19 patients, including a 63-year-old father of two who was admitted to the emergency room with Covid-19 and soon recovered. It was the early days of the coronavirus pandemic and the treatments used for this patient could provide invaluable insight into the then little-understood virus.
Normally, it would have been unthinkable to share sensitive medical details, such as the patient’s use of Lipitor for high cholesterol, so quickly, without taking measures to safeguard his privacy. But this man wasn’t real. He was a fake patient created by algorithms that take details from real-life data sets such as electronic medical records, scramble them and piece them back together to create artificial patient populations that largely mirror the real thing but don’t include any real patients.
Medical researchers and data scientists say such “synthetic” healthcare data has the potential to speed up medical innovation. The rapid digitization of health records has created troves of patient information that can be analyzed by algorithms and harnessed to improve disease-treatment models and develop new products and services. But patient information isn’t easy to get because privacy laws require medical data to be stripped of names, addresses and other identifying details before it can be shared, a time-consuming process that can take months. Even those measures don’t satisfy some privacy advocates, who point to studies showing that it is possible to re-identify patients even after data sets have been anonymized.
Enter synthetic-data technology.