Citation
Hossen, Jakir and Jesmeen H, M Z and Sayeed, Shohel (2018) Modifying cleaning method in big data analytics process using random forest classifier. In: 2018 7th International Conference on Computer and Communication Engineering (ICCCE). IEEE, pp. 208-213. ISBN 978-1-5386-6992-1, 978-1-5386-6991-4
Text
hossen2018.pdf - Published Version Restricted to Repository staff only Download (286kB) |
Abstract
Accurate data is a key success factor influencing the performance of data analytics results, especially for the detection and prediction purpose. Nowadays, Big Data analytics (BDA) is used to analyze the sheer volume of data available in an organization. These data quality must be maintained in order to obtain correct alert and valuable insights from the rapidly changing data of high volume, velocity, variety, veracity, and value. This paper aim is to modify existing framework of big data analytics by improving an important step in pre-processing (i.e. Data Cleaning). Initially, feature selection based on Random Forest is used to extract effective features. Then, two classifier algorithms (i.e. Random Forest classifier and Linear SVM classifier) are applied to train using the dataset to classify data quality and to develop an intelligent model. In evaluation, our experimental results show a consistent accuracy of Random Forest and Linear Regression around 90%. Using this approach, we expect to provide a set of cleaned data for further processing. Besides, analysts can benefit from this system in data analytical process in cleaning stage and conclude that the data is cleaned. Finally, a comparison is presented between available functions which are used to handle missing values with the developed system.
Item Type: | Book Section |
---|---|
Uncontrolled Keywords: | Support vector machines, Data models, Cleaning, Feature extraction, Training, Big Data, Forestry, |
Subjects: | Q Science > QA Mathematics > QA299.6-433 Analysis |
Divisions: | Faculty of Engineering (FOE) |
Depositing User: | Ms Suzilawati Abu Samah |
Date Deposited: | 04 Jan 2021 07:42 |
Last Modified: | 04 Jan 2021 07:42 |
URII: | http://shdl.mmu.edu.my/id/eprint/7309 |
Downloads
Downloads per month over past year
Edit (login required) |