Issue 38 / May 2021

IN VIVO

Visiting Scientist Tackles Data at MIT

Visiting Scientist Tackles Data at MIT

An NUHS resident gets immersed in data, machine learning and more at the Massachusetts Institute of Technology (MIT).

I

was given the opportunity to work as a visiting scientist at the Laboratory for Computational Physiology (LCP) at MIT in spring 2020, under the supervision of Dr Leo Anthony Celi, Principal Research Scientist at the LCP and Intensive Care Physician at Beth Israel Deaconess Medical Center. The LCP curates several large, open access clinical databases on their Physionet platform. These include the Medical Information Mart for Intensive Care data sets and the eICU Collaborative Research Database. It is also the pioneer of the datathon model, having held over 35 datathons across the globe. The datathon is a short two to three-day event where clinicians and data scientists form teams to tackle research questions using clinical data.

While at the LCP, I was plugged into multiple research projects involving the use of machine learning techniques to perform a variety of tasks, from the estimation of the causal effect of blood transfusions in critically ill patients using observational data, to predicting patient mortality from COVID-19 using multimodal learning.


Icon - Global datathon

>35

global datathons
have been organised by
MIT LCP


Besides research work, I helped to mentor groups at the first MIT COVID-19 datathon where several hundred participants around the world (including a handful from Singapore!) gathered to hack urban and public health data from New York, and come up with solutions to tackle the COVID-19 pandemic.

My own journey in medical informatics began in my third year of medical school when I picked up the R programming language through online courses. I continued to develop programming skills alongside my clinical work and was able to develop several internal applications for use in the Singapore Civil Defence Force during my National Service. Programming languages such as R and Python allow one to access and manipulate data on a much larger scale and complexity than traditional statistical packages and spreadsheet software.

Employing data in healthcare systems

Healthcare systems generate vast amounts of data every day within their electronic health records in a variety of media, including numeric records, textual clinical notes, waveform data and radiologic images. Such digitalisation of clinical data has brought forth tremendous opportunity to reduce uncertainty in clinical decision making, better ways to treat diseases and safer health systems. The field of medical informatics aims to develop tools and methods to make use of health data to improve medical systems and practice.

Predictive analyses in medicine use past data about certain phenomena to derive mathematical relationships between the state of the patient at a certain point in time and the state of the disease in the future. Prediction is useful as it could potentially alert clinicians to future adverse events such as a developing disease, or a particular failing treatment. This could then prompt clinicians to adjust the treatment plan for the patient.

“Healthcare systems generate vast amounts of data every day within their electronic health records in a variety of media, including numeric records, textual clinical notes, waveform data and radiologic images. Such digitalisation of clinical data has brought forth tremendous opportunity to reduce uncertainty in clinical decision making, better ways to treat diseases and safer health systems.”

To make use of the vast amounts of data generated by electronic health records for prediction, there are several necessary steps. Firstly, data curation and harmonisation involve the development of systems and processes at the software and network levels to extract operational data generated during the course of clinical practice in a format and structure accessible to researchers. The next step involves applying robust statistical and machine learning methodology to generate algorithms that answer clinical questions. Lastly, and arguably the most difficult part, is bringing these algorithms to the bedside through robust clinical trials and evaluating its performance in a safe and unbiased manner.

Homecoming

I am working with Assistant Professor Kenneth Ban on the Health Informatics Pathway at NUS Medicine to train the next generation of medical students and physicians on the fundamentals of clinical informatics and data science. In addition, I also teach a data science workshop for clinicians in National University Hospital (NUH). I continue to be actively involved in research projects that use machine learning techniques to solve clinical problems.

I thank Dr Leo Celi for the guidance provided to me during my time in the US; Associate Professor Dan Yock Young and Dr Adrian Kee for the support they have provided; Dr Ngiam Kee Yuan for endorsing me; and A/Prof Kenneth Ban for providing the opportunity to contribute to the Health Informatics Pathway. Lastly, I thank the Yong Loo Lin School-NUHS-Harvard-BIDMC Programme for the funding support.