Performance Improvement in Hadoop Mapreduce

Kalia, Khushboo

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/427502

Title:	Performance Improvement in Hadoop Mapreduce
Researcher:	Kalia, Khushboo
Guide(s):	Nagpal, Pooja and Neeraj Gupta
Keywords:	Computer Science Computer Science Artificial Intelligence Engineering and Technology
University:	K.R. Mangalam Univeristy, Gurgaon
Completed Date:	2022
Abstract:	Big Data was initially used in 1970 on atmospheric and deep-sea soundings. It s a collection of newlinegiant and complex data sets to be processed by traditional tools. Traditional tools were not good newlineenough to process such huge data. newlineHadoop MapReduce framework was developed by Google for processing vast amount of data in newlineparallel and distributed environment. The default Hadoop implementation assumes that the newlineexecuting nodes are homogeneous. The easiness of the model and the fault-tolerance feature of the newlineframework make it very popular in processing Big Data. As this programming model gets popular, newlinethe scheduling and locality of the jobs and data becomes very significant. newlineData locality is an important feature that Hadoop introduced to improve the performance of the newlinemodel. The key idea is to move the map task closer to the node where the actual data resides rather newlinethan transferring the vast data set near the computation. Data locality helps in lowering the network newlinecongestion and improving performance. However, this practice fails when processing the data in newlinea heterogeneous Hadoop cluster. In a heterogeneous setup, nodes with different computational newlinecapabilities pose a crucial challenge. Nodes with a faster processing capacity finish the job newlinecompared to the nodes with slower processing ability. newlineThe objective of this dissertation is to provide with a scheduling theory which is based on KNN newlineclustering and prefetching. The process starts with speculative prefetching and then performing newlinethe KNN clustering on the intermediate map output before directing it to the reducer for final newlineprocessing. The performance evaluation of scheduler performance is analyzed by executing newlinedifferent workloads like word count, random text, random num, and Sort. The results show that newlinethe proposed idea improves the performance of job execution newline
Pagination:	XII, 136
URI:	http://hdl.handle.net/10603/427502
Appears in Departments:	Department of Computer Science

Files in This Item:

File	Description	Size	Format
01_title page.pdf	Attached File	43.08 kB	Adobe PDF	View/Open
02_prelim.pdf		252.55 kB	Adobe PDF	View/Open
03_content.pdf		126.27 kB	Adobe PDF	View/Open
04_abstract.pdf		120.09 kB	Adobe PDF	View/Open
05_chapter1- introduction.pdf		751.96 kB	Adobe PDF	View/Open
06_chapter 2 literature survey.pdf		786.5 kB	Adobe PDF	View/Open
07_chapter 3 methodology.pdf		554.62 kB	Adobe PDF	View/Open
08_chapter 4 result and discussion.pdf		421.99 kB	Adobe PDF	View/Open
09_chapter 5 conclusion and future work.pdf		216.71 kB	Adobe PDF	View/Open
10_annexures.pdf		1.85 MB	Adobe PDF	View/Open
80_recommendation.pdf		242.28 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET