Banking On Big Data For Biomedical Insights – Precision Medicine In The Era of Big Data And Biobanks

Fifteen years ago, researchers observed that French families with very high LDL cholesterol levels carried a mutation in the gene coding for the proprotein convertase subtilisin/kexin type 9 (PCSK9) enzyme, making PCSK9 more active. PCSK9 promotes the destruction of LDL cholesterol receptors on cells, which help to remove LDL cholesterol from the blood.u00a0

On the other hand, another group of researchers identified individuals in the Dallas Heart Study who had very low LDL cholesterol and seemed to be protected from heart disease; they had a different mutation in the PCSK9 gene that made PCSK9 less active. Based on these findings, researchers developed antibodies that inhibited PCSK9.¹ Today, the PCSK9 inhibitors are a new class of effective cholesterol drugs which, when added to statins, lowers cholesterol levels and prevents heart disease, almost exactly as predicted by the genetic association studies.

The discovery of PCSK9 inhibitors illustrates how big healthcare data (both clinical and genomic data) can help to improve patient care by identifying novel drug targets. Much of this comes from the big healthcare data that is linked to biological samples in human biobanks.

A century ago, university researchers would occasionally store small collections of human biological samples for specific research projects.² These early biobanks included a few pieces of information, such as the date of collection and clinical or disease data (aka. phenotypic data).u00a0

Today, biobanks have evolved both in scale and complexity, encompassing samples from many more people. In addition, advancing digital technologies have made it possible to store and link together large amounts of different types of data, including imaging, genomic and electronic health records (EHR) data. This has greatly expanded the quantity and diversity of data associated with the biological samples in biobanks.

On the larger end of the scale, many countries have established population biobanks comprising samples and associated clinical and biological (including genomic) data from hundreds of thousands of healthy and sick individuals. Examples are the Japan Biobank, the UK Biobank and the All of Us biobank, the last of which aims to collect and store samples and data from 1 million people.^3-5 Some university biobanks stand shoulder to shoulder with these population biobanks, with Vanderbilt Universityu2019s BioVU boasting more than 250,000 DNA samples that are linked to the corresponding patient electronic health records (EHRs).⁶

Biobanking the Ethical Way

Much of a biobanku2019s value resides in the people who agree to contribute their samples and data to it. However, in light of recent data breaches, some people may be reluctant to participate because of concerns about how their samples and data will be used and protected. Biobanks need to address these and other ethical concerns in order to recruit participants and maintain their relevance:

What Are Biobanks Good For?

Biobanks and their associated data have many potential applications in healthcare. By combining biological data with diverse phenotypic data, investigators can discover novel drug targets, as well as develop diagnostic and treatment strategies that are tailored to specific groups of patients.

In addition, some genes called pleiotropic genes are associated with multiple diseases, which are usually difficult to identify. Investigators can use the linked genetic and phenotypic data maintained by biobanks to identify the multiple diseases affected by the same gene.⁷

They could also use the data to study the potential impact of a particular treatment strategy on multiple linked diseases and identify additional benefits or potential side effects. This knowledge can lead to the selection of better candidate treatments for further development, potentially saving billions of dollars that would otherwise be spent developing treatments that eventually fail.⁸

However, setting up a useful biobank and big data resource is not a trivial undertaking. Investigators at the NUHS Centre for Precision Health (CPH), which set up the NUHS biobank PHEN-GEN (short for u201cPhenotype-Genotypeu201d), knew they would have to overcome significant challenges to create an effective resource.

The PHEN-GEN team comprises investigators who are interested in the clinical application of genetics. They are clinician-scientists from the NUS Yong Loo Lin School of Medicine and its affiliated academic health system, the National University Health System (NUHS), including Professors Tai E Shyong, Goh Boon Cher, Lee Soo Chin, Mark Chan, and Adrian Low, and Drs Teng Gim Gee and Peter Cheung; the National University Hospitalu2019s Chief Technology Officer and big data expert Assistant Professor Ngiam Kee Yuan; and health economist Assistant Professor Wee Hwee Lin.

Based on Vanderbiltu2019s BioVU biobank, PHEN-GEN aims to initially collect and store 10,000 blood samples from outpatients at various NUH specialty clinics. This biobank will be an integral part of the Precision Medicine Strategic Research Programme, one of nine new strategic programmes at NUS Medicine that will bring basic scientists and clinicians together to meet common biomedical goals.

Facing Hurdles

Two immediate challenges are the storage capacity needed for the large number of samples, and the vast computing power required for big data analysis. Both of these elements may be difficult for individual researchers to afford on their own. PHEN-GEN is meeting the first challenge by turning to the NUH Tissue Repository, which stores millions of samples according to industry best practices. To access the required computing power, the PHEN-GEN biobank will tap into DISCOVERY AI, an NUHS platform that connects multiple machine learning systems to research and clinical databases (including the NUH patient EHR system) for big data analysis and the development of machine learning tools. DISCOVERY AI will link the PHEN-GEN blood samples with phenotypic data from the corresponding patient EHRs and genetic data, if available.

A third challenge is that different types of data often have different formats. This includes doctorsu2019 notes, or the information about aspects such as diagnosis and treatment that is input by clinicians. In order to analyse the data properly and use the results to make accurate predictions about, for example, disease risk and treatment response, the data needs to be standardised and harmonised after it is collected. PHEN-GEN meets this challenge by applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model to EHR data, which transforms the various types of data into a consistent format.

Besides these challenges, human biobanks must consider ethical
issues that are unique to activities involving human subjects. Key ethical issues are consent, privacy, data security and transparency. The PHEN-GEN patient consent process fulfils the requirements put forth by the Singapore Personal Data Protection Act (PDPA)⁹ and the Human Biomedical Research Act (HBRA),¹⁰ and is further guided by recommendations in the GA4GH consent toolkit.¹¹ To help protect the privacy of participants, DISCOVERY AI has a data de-identification function that can remove person identifiers such as NRIC, name and date of birth from the patient health data; these identifiers are then stored on a separate server from the one used to store the health data.

DISCOVERY AI also includes a data governance function, in line with PDPA and the HBRA requirements. Access to PHEN-GEN samples and data is limited to those investigators whose proposals have been approved by both the relevant Institutional Review Board and the PHEN-GEN Data Governance Committee, a group of people that may include principal investigators and data experts. Approved investigators will be able to access the de-identified data in an environment that is optimised for both accessibility and the safeguarding of the privacy and confidentiality of the people contributing the data.

Said Prof Tai E Shyong, u201cAs PHEN-GEN and other biobanks continue to add to the quantity and diversity of their samples and data, and as technologies to analyse large numbers of analytes in large populations advance even further, the breadth and impact of applications will only increase.u201d

References

1 Joshi PH, Martin SS, Blumenthal RS. The fascinating story of PCSK9 inhibition: Insights and perspective from ACC. Cardiology Today. May 2014. https://www.healio.com/cardiology/chd- prevention/news/print/cardiology-today/%7Bd531fcd9-ea52- 4230-b412-da9270344fff%7D/the-fascinating-story-of-pcsk9- inhibition-insights-and-perspective-from-acc. Accessed March 12, 2020.

2 Elseman E, Haga SB. Handbook of Human Tissue Sources. Santa Monica, CA: Rand; 1999.

3 Nagai A, Hirata M, Kamatani Y, et al. Overview of the BioBank Japan Project: Study design and profile J Epidemiol. 201727(3 Suppl):S2-S8.

4 UK Biobank Web site. https://www.ukbiobank.ac.uk/about- biobank-uk/. Updated Jan 24, 2019. Accessed March 2, 2020.

5 Desjardins P. Biobanking for u201cAll of Us.u201d Genetic Engineering & Biotechnology News. February 1, 2017. https://www. genengnews.com/magazine/286/biobanking-for-all-of-us/. Accessed March 2, 2020.

6 What is BioVU? Vanderbilt Institute for Clinical and Translational Research Web site. https://victr.vumc.org/what-is- biovu/. Accessed March 2, 2020.

7 Denny JC, Ritchie MD, Basford MA, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205-1210.10.

8 Paul SM, Mytelka DS, Dunwiddie CT, et al. How to improve R&D productivity: the pharmaceutical industryu2019s grand challenge. Nat Rev Drug Discov. 2010;9:203-214.

9 Government of the Republic of Singapore. Personal Data Protection Act 2012. Singapore: Government of the Republic of Singapore, 2012.

10 Government of the Republic of Singapore. Human Biomedical Research Act 2015. Singapore: Government of the Republic of Singapore, 2015.

11 Global Alliance for Genomics & Health Web site. https:// www.ga4gh.org/. Accessed March 2, 2020.

12 Schaefer GO, Tai ES, Sun S. Precision medicine and big data; The application of an ethics framework for big data in health and research. Asian Bioethics Review. 2019;11:275-288.

13 Xafis V, Labude MK. Openness in big data and data repositories. Asian Bioethics Review. 2019;11:255-273.

14 World Medical Association. World Medical Association Declaration of Helsinki: ethical principles for medical research involving research subjects. JAMA. 2013;310:2191-2194.

15 WMA Declaration of Taipei on Ethical Considerations Regarding Health Databases and Biobanks. file:///C:/Users/ medkiw/Downloads/wma-declaration-of-taipei-on-ethical- considerations-regarding-health-databases-and-biobanks.pdf. October, 2016. Accessed March 2, 2020.