Geospatial Features Influencing the Formation of COVID-19 Clusters

Citation

Haque, Radiah and Ting, Choo Yee and Ee, Yeo Keat and Ng, Keng Hoong and Shaaban, Mohamed Najib and Pee, Chih Yang and Wong, Lai Kuan and Baha Raja, Dhesi (2022) Geospatial Features Influencing the Formation of COVID-19 Clusters. Journal of System and Management Sciences, 12 (5). pp. 1-20. ISSN 1816-6075, 1818-0523

[img] Text
12.pdf
Restricted to Repository staff only

Download (698kB)

Abstract

. Machine Learning methods have been used to combat COVID-19 since the pandemic has started in year 2020. In this regard, most studies have focused on detecting and identifying the characteristics of SARS-CoV-2, especially via image processing. Some studies have applied machine learning for contact tracing to minimise the transmission of COVID-19 cases. Limited work has, however, reported on how geospatial features have an influence on the transmission of COVID-19 and formation of clusters at local scale. Therefore, this paper has aimed to study the importance of geospatial features that had resorted to COVID19 cluster formation in Kuala Lumpur, Malaysia in year 2021. Several datasets were used in this work, which have included the address details of confirmed positive COVID-19 cases and the details of nearby residential areas and Points of Interest (POI) located within the federal territory of Kuala Lumpur. The datasets were pre-processed and transformed into an analytical dataset for conducting empirical investigations. Various feature selection methods were applied, including the Boruta Algorithm, Chi-square (Chi2) Test, Extra Trees Classifier (ETC), Recursive Feature Elimination (RFE) method, and Deep Learning Autoencoder (DLA). Detailed investigations on the top-n features were performed to elicit a set of optimal features. Subsequently, several machine learning models were trained using the optimal features, including Logistic Regression (LR), Random Forest Classifier (RFC), Naïve Bayes Classifier (NBC), and Extreme Gradient Boosting (XGBoost). It was revealed that Boruta produced the optimal number of features with n = 96, whereas RFC achieved the best prediction results compared to other classifiers, with around 95% accuracy. Consequently, the findings in this paper help to recognize the geospatial features that have impacts on the formation of COVID19 and other infectious disease clusters at local scale.

Item Type: Article
Uncontrolled Keywords: machine learning, geospatial analytics, feature importance, COVID19 clusters
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 01 Dec 2022 04:01
Last Modified: 01 Dec 2022 04:01
URII: http://shdl.mmu.edu.my/id/eprint/10879

Downloads

Downloads per month over past year

View ItemEdit (login required)