Performance Improvement in Hadoop Mapreduce

Kalia, Khushboo

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/427502

Full metadata record

DC Field	Value	Language
dc.coverage.spatial
dc.date.accessioned	2022-12-18T09:31:06Z	-
dc.date.available	2022-12-18T09:31:06Z	-
dc.identifier.uri	http://hdl.handle.net/10603/427502	-
dc.description.abstract	Big Data was initially used in 1970 on atmospheric and deep-sea soundings. It s a collection of newlinegiant and complex data sets to be processed by traditional tools. Traditional tools were not good newlineenough to process such huge data. newlineHadoop MapReduce framework was developed by Google for processing vast amount of data in newlineparallel and distributed environment. The default Hadoop implementation assumes that the newlineexecuting nodes are homogeneous. The easiness of the model and the fault-tolerance feature of the newlineframework make it very popular in processing Big Data. As this programming model gets popular, newlinethe scheduling and locality of the jobs and data becomes very significant. newlineData locality is an important feature that Hadoop introduced to improve the performance of the newlinemodel. The key idea is to move the map task closer to the node where the actual data resides rather newlinethan transferring the vast data set near the computation. Data locality helps in lowering the network newlinecongestion and improving performance. However, this practice fails when processing the data in newlinea heterogeneous Hadoop cluster. In a heterogeneous setup, nodes with different computational newlinecapabilities pose a crucial challenge. Nodes with a faster processing capacity finish the job newlinecompared to the nodes with slower processing ability. newlineThe objective of this dissertation is to provide with a scheduling theory which is based on KNN newlineclustering and prefetching. The process starts with speculative prefetching and then performing newlinethe KNN clustering on the intermediate map output before directing it to the reducer for final newlineprocessing. The performance evaluation of scheduler performance is analyzed by executing newlinedifferent workloads like word count, random text, random num, and Sort. The results show that newlinethe proposed idea improves the performance of job execution newline
dc.format.extent	XII, 136
dc.language	English
dc.relation	115
dc.rights	university
dc.title	Performance Improvement in Hadoop Mapreduce
dc.title.alternative
dc.creator.researcher	Kalia, Khushboo
dc.subject.keyword	Computer Science
dc.subject.keyword	Computer Science Artificial Intelligence
dc.subject.keyword	Engineering and Technology
dc.description.note
dc.contributor.guide	Nagpal, Pooja and Neeraj Gupta
dc.publisher.place	Gurgaon
dc.publisher.university	K.R. Mangalam Univeristy, Gurgaon
dc.publisher.institution	Department of Computer Science and Engineering
dc.date.registered	2015
dc.date.completed	2022
dc.date.awarded	2022
dc.format.dimensions	21X29.7
dc.format.accompanyingmaterial	DVD
dc.source.university	University
dc.type.degree	Ph.D.
Appears in Departments:	Department of Computer Science

Files in This Item:

File	Description	Size	Format
01_title page.pdf	Attached File	43.08 kB	Adobe PDF	View/Open
02_prelim.pdf		252.55 kB	Adobe PDF	View/Open
03_content.pdf		126.27 kB	Adobe PDF	View/Open
04_abstract.pdf		120.09 kB	Adobe PDF	View/Open
05_chapter1- introduction.pdf		751.96 kB	Adobe PDF	View/Open
06_chapter 2 literature survey.pdf		786.5 kB	Adobe PDF	View/Open
07_chapter 3 methodology.pdf		554.62 kB	Adobe PDF	View/Open
08_chapter 4 result and discussion.pdf		421.99 kB	Adobe PDF	View/Open
09_chapter 5 conclusion and future work.pdf		216.71 kB	Adobe PDF	View/Open
10_annexures.pdf		1.85 MB	Adobe PDF	View/Open
80_recommendation.pdf		242.28 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET