Towards machine learning-based self-tuning of Hadoop-Spark system

Citation

Rahman, Md. Armanur and Hossen, Jakir and Venkataseshaiah, Chinthakunta and Bhuvaneswari, Thangavel and Sultana, Aziza and Hossen, Abid (2019) Towards machine learning-based self-tuning of Hadoop-Spark system. Indonesian Journal of Electrical Engineering and Computer Science, 15 (2). pp. 1076-1085. ISSN 2502-4752

[img] Text
259.pdf - Published Version
Restricted to Repository staff only

Download (973kB)

Abstract

Apache Spark is an open source distributed platform which uses the concept of distributed memory for processing big data. Spark has more than 180 predominant configuration parameter. Configuration settings directly control the efficiency of Apache spark while processing big data, to get the best outcome yet a challenging task as it has many configuration parameters. Currently, these predominant parameters are tuned manually by trial and error. To overcome this manual tuning problem in this paper proposed and developed a self-tuning approach using machine learning. This approach can tune the parameter value when it’s required. The approach was implemented on Dell server and experiment was done on five different sizes of the dataset and parameter. A comparison is provided to highlight the experimented result of the proposed approach with default Spark configuration system. The results demonstrate that the execution is speeded-up by about 33% (on an average) compared to the default configuration.

Item Type: Article
Uncontrolled Keywords: Apache spark, Big data, Machine learning, Self-tuning, Spark parameter
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Engineering and Technology (FET)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 24 Feb 2022 03:44
Last Modified: 24 Feb 2022 03:44
URII: http://shdl.mmu.edu.my/id/eprint/9181

Downloads

Downloads per month over past year

View ItemEdit (login required)