Citation
Azman, Afizan and Husin, Husna Sarirah and Hamzah, Norhidayah and Phang, Swee King and Yogarayan, Sumendra and Lalitha, R. (2024) Analysis of the Central Limit Theorem-Driven Numerosity Reduction Technique: Implications for Binary Classification Algorithm Performance and Training Duration. In: 9th International Conference on Micro-Electronics, Electromagnetics and Telecommunications, ICMEET 2024, 19 December 2024 - 20 December 2024, Kolkata.|
Text
Analysis of the Central Limit.pdf - Published Version Restricted to Repository staff only Download (452kB) |
Abstract
The burgeoning growth of data in recent years has led to datasets becoming increasingly larger, which poses challenges for the development of machine learning models. The complexity of contemporary machine learning algorithms often results in slower training times and longer development periods. To address these issues, this paper proposes a novel numerosity reduction algorithm that leverages the Central Limit Theorem (CLT) to reduce the number of data points for training machine learning models while preserving interpretability and conforming to a Gaussian distribution. The proposed numerosity reduction algorithm is expected to enhance the training time of binary classification models and improve the performance of algorithms that rely on the assumption that the training features are Gaussian distributed. Consequently, this technique may facilitate the development of large and complex machine learning algorithms, making them more feasible to train and deploy. The aim is to decrease the number of data points (cloud points) used for training machine learning models, while preserving interpretability and transforming the distribution of the variables to Gaussian. This is particularly advantageous as several algorithms assume data to be normally distributed. By harnessing the properties of CLT, it is expected that training times for binary classification algorithms will be enhanced, as well as the performance of algorithms that rely on gradient descent for optimization. This speed in training is pivotal for the development of large and complex machine learning algorithms.
| Item Type: | Conference or Workshop Item (Paper) |
|---|---|
| Uncontrolled Keywords: | Central limit theorem (CLT), classification models, gaussianity, numerosity reduction |
| Subjects: | Q Science > QA Mathematics > QA273-280 Probabilities. Mathematical statistics |
| Divisions: | Faculty of Information Science and Technology (FIST) |
| Depositing User: | Nor Afiqah Mohd Adnan |
| Date Deposited: | 09 Dec 2025 03:24 |
| Last Modified: | 09 Dec 2025 03:24 |
| URII: | http://shdl.mmu.edu.my/id/eprint/14973 |
Downloads
Downloads per month over past year
Edit (login required) |
