Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/427502
Title: Performance Improvement in Hadoop Mapreduce
Researcher: Kalia, Khushboo
Guide(s): Nagpal, Pooja and Neeraj Gupta
Keywords: Computer Science
Computer Science Artificial Intelligence
Engineering and Technology
University: K.R. Mangalam Univeristy, Gurgaon
Completed Date: 2022
Abstract: Big Data was initially used in 1970 on atmospheric and deep-sea soundings. It s a collection of newlinegiant and complex data sets to be processed by traditional tools. Traditional tools were not good newlineenough to process such huge data. newlineHadoop MapReduce framework was developed by Google for processing vast amount of data in newlineparallel and distributed environment. The default Hadoop implementation assumes that the newlineexecuting nodes are homogeneous. The easiness of the model and the fault-tolerance feature of the newlineframework make it very popular in processing Big Data. As this programming model gets popular, newlinethe scheduling and locality of the jobs and data becomes very significant. newlineData locality is an important feature that Hadoop introduced to improve the performance of the newlinemodel. The key idea is to move the map task closer to the node where the actual data resides rather newlinethan transferring the vast data set near the computation. Data locality helps in lowering the network newlinecongestion and improving performance. However, this practice fails when processing the data in newlinea heterogeneous Hadoop cluster. In a heterogeneous setup, nodes with different computational newlinecapabilities pose a crucial challenge. Nodes with a faster processing capacity finish the job newlinecompared to the nodes with slower processing ability. newlineThe objective of this dissertation is to provide with a scheduling theory which is based on KNN newlineclustering and prefetching. The process starts with speculative prefetching and then performing newlinethe KNN clustering on the intermediate map output before directing it to the reducer for final newlineprocessing. The performance evaluation of scheduler performance is analyzed by executing newlinedifferent workloads like word count, random text, random num, and Sort. The results show that newlinethe proposed idea improves the performance of job execution newline
Pagination: XII, 136
URI: http://hdl.handle.net/10603/427502
Appears in Departments:Department of Computer Science

Files in This Item:
File Description SizeFormat 
01_title page.pdfAttached File43.08 kBAdobe PDFView/Open
02_prelim.pdf252.55 kBAdobe PDFView/Open
03_content.pdf126.27 kBAdobe PDFView/Open
04_abstract.pdf120.09 kBAdobe PDFView/Open
05_chapter1- introduction.pdf751.96 kBAdobe PDFView/Open
06_chapter 2 literature survey.pdf786.5 kBAdobe PDFView/Open
07_chapter 3 methodology.pdf554.62 kBAdobe PDFView/Open
08_chapter 4 result and discussion.pdf421.99 kBAdobe PDFView/Open
09_chapter 5 conclusion and future work.pdf216.71 kBAdobe PDFView/Open
10_annexures.pdf1.85 MBAdobe PDFView/Open
80_recommendation.pdf242.28 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: