Application of database and data science techniques in the Malaysian Breast Cancer Survivorship Cohort STUDY

Mogana Darshini Ganggayah, Sarinder Kaur Dhillon, Nur Aishah Mohd Taib, Tania Islam

Research output: Contribution to conferencePosterpeer-review


Background: Breast cancer is one of the leading cause of mortality among women worldwide. The Breast Cancer Resource Centre (BCRC) of University Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia, started the Malaysian Breast Cancer Survivorship Cohort (MyBCC) study in 2012. Aim: As a further enhancement of the research, the MyBCC database has been developed to conduct the survey in a convenient way, which aims to predict the factors influencing different survival rate among patients from multiethnic origin using data science techniques. Methods: The database comprised of life style related data of the patients including demographic factors, information on other illness, clinical factors, quality of life, psychosocial support, physical activity, work related questions, depression score, family background, type of medication consumed and financial status of the patients. This paper presents an approach to build an automated workflow using the MySQL database management system and PHP, integrated with R and HTML for web display. Results: A relational database comprising 816 breast cancer patients' data were developed for the MyBCC cohort study. This database serves as the backend for the MyBCC application where researchers can register new patients' records, update and view the information of recruited patients by using the system in a more commodious environment than before. Besides, the MyBCC database has been integrated with R programming tool by deploying the RMySQL package to perform audits. A few important analysis using plotly package, leveraging the integration of R with database are presented. Conclusion: In this paper, the development of the MyBCC database is presented, with the aim to automate the manual process of data entry, storage and analysis for performing audits for the breast cancer cohort study. The integration of database with R for automated analysis of data are also shown using examples of predictions that can be generated using functions in R. This fully automated workflow reduces the workload and time taken in performing manual predictions using data sources stored in flat files.
Original languageEnglish
Number of pages1
Publication statusPublished - 2018
EventInternational Conference on Bioinformatics 2018 - Jawaharlal Nehru University, New Delhi, India
Duration: 26 Sept 201828 Sept 2018
Conference number: 17th


ConferenceInternational Conference on Bioinformatics 2018
Abbreviated titleInCoB 2018
CityNew Delhi
Internet address

Cite this