Main Menu

Improvement of k-Means Clustering Performance on Disease Clustering using Gaussian Mixture Model

Citation

Santoso, Heru Agus and Haw, Su Cheng (2023) Improvement of k-Means Clustering Performance on Disease Clustering using Gaussian Mixture Model. Journal of System and Management Sciences, 13 (5). ISSN 1816-6075, 1818-0523

Text
49.pdf - Published Version
Restricted to Repository staff only
Download (341kB)

Official URL: https://doi.org/10.33168/JSMS.2023.0511

Abstract

. k-Means clustering algorithm is an unsupervised learning, provides no opportunity for a data point to be a member of two or more clusters. In fact, a data point can belong to two or more clusters. In our dataset, a set of particular diseases can be member of different cluster locations. Gaussian Mixture Model (GMM) can solve the problem of this k-Means' hard assignment technique. Preprocessing approach on the dataset was also carried out using PCA after the result of Hopkins statistics far from sufficient for clustering purposes. PCA reduces the dimension of dataset, provides the most informative variables that explain the majority of the data. Hopkins test reached 0.958 after performing PCA, indicates the dataset has high tendency to cluster. Improving the performance of k-Means clustering with GMM using Loglikehood, GMM yielded a better result, i.e., 2.217 as compared to k-Means that yielded - 606.604. It means GMM outperforms k-means in term of model fitness to the dataset.

Item Type:	Article
Uncontrolled Keywords:	Clustering, soft-assigment clustering, k-Means, Gaussian Mixture Model.
Subjects:	Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions:	Faculty of Computing and Informatics (FCI)
Depositing User:	Ms Nurul Iqtiani Ahmad
Date Deposited:	31 Oct 2023 09:14
Last Modified:	31 Oct 2023 09:14
URII:	http://shdl.mmu.edu.my/id/eprint/11806

Downloads

Downloads per month over past year

Edit (login required)