In-Cluster Combiner Algorithm on Virtual Hadoop Clusters

Citation

Mhlanga, Imran Artwel J. (2019) In-Cluster Combiner Algorithm on Virtual Hadoop Clusters. Masters thesis, Multimedia University.

Full text not available from this repository.
Official URL: http://erep.mmu.edu.my/

Abstract

As a result of the technological advancements, such as the emergence of Internet-of-Things (IoT) and Machine to Machine communication, a high volume of data is being generated at a rate higher than before. The size and structure of such data is so complex that traditional hardware and software no longer have appropriate storage and processing capability; thus, there is a need for a high performance and efficient framework capable of storing and processing very large datasets like Apache Hadoop. MapReduce, a core component of Hadoop, has been widely deployed as the most efficient framework for big data processing due to its ability to run on commodity hardware as well as to manage parallel execution of tasks automatically and effectively. The problem with a typical MapReduce job is that it produces a very large intermediate data, which often results in network congestion during data shuffling to the reducers. This often results in slower job completion times due to the delays caused by the congestion. Furthermore, job completion times tend to become longer when a job is run in a geo-distributed environment where nodes are in different regions. Therefore, this research aims to address the problem of data size produced by the map tasks while running a job in a geo-distributed cluster. The research proposes In-Cluster Combiner (ICC), an aggregation algorithm that aims to reduce the size of intermediate data produced by map tasks. It does this by aggregating all the data within a cluster that the map tasks are running in. Current aggregation algorithms which have been designed are only applied for a job run in a single cluster. The proposed algorithm performs better than the existing algorithms when running a job in a geo-distributed cluster.

Item Type: Thesis (Masters)
Additional Information: Call No: QA9.58 .I47 2019
Uncontrolled Keywords: Algorithms
Subjects: Q Science > QA Mathematics > QA1-43 General
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 24 Feb 2023 09:12
Last Modified: 24 Feb 2023 09:12
URII: http://shdl.mmu.edu.my/id/eprint/11152

Downloads

Downloads per month over past year

View ItemEdit (login required)