Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/427502
Title: | Performance Improvement in Hadoop Mapreduce |
Researcher: | Kalia, Khushboo |
Guide(s): | Nagpal, Pooja and Neeraj Gupta |
Keywords: | Computer Science Computer Science Artificial Intelligence Engineering and Technology |
University: | K.R. Mangalam Univeristy, Gurgaon |
Completed Date: | 2022 |
Abstract: | Big Data was initially used in 1970 on atmospheric and deep-sea soundings. It s a collection of newlinegiant and complex data sets to be processed by traditional tools. Traditional tools were not good newlineenough to process such huge data. newlineHadoop MapReduce framework was developed by Google for processing vast amount of data in newlineparallel and distributed environment. The default Hadoop implementation assumes that the newlineexecuting nodes are homogeneous. The easiness of the model and the fault-tolerance feature of the newlineframework make it very popular in processing Big Data. As this programming model gets popular, newlinethe scheduling and locality of the jobs and data becomes very significant. newlineData locality is an important feature that Hadoop introduced to improve the performance of the newlinemodel. The key idea is to move the map task closer to the node where the actual data resides rather newlinethan transferring the vast data set near the computation. Data locality helps in lowering the network newlinecongestion and improving performance. However, this practice fails when processing the data in newlinea heterogeneous Hadoop cluster. In a heterogeneous setup, nodes with different computational newlinecapabilities pose a crucial challenge. Nodes with a faster processing capacity finish the job newlinecompared to the nodes with slower processing ability. newlineThe objective of this dissertation is to provide with a scheduling theory which is based on KNN newlineclustering and prefetching. The process starts with speculative prefetching and then performing newlinethe KNN clustering on the intermediate map output before directing it to the reducer for final newlineprocessing. The performance evaluation of scheduler performance is analyzed by executing newlinedifferent workloads like word count, random text, random num, and Sort. The results show that newlinethe proposed idea improves the performance of job execution newline |
Pagination: | XII, 136 |
URI: | http://hdl.handle.net/10603/427502 |
Appears in Departments: | Department of Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title page.pdf | Attached File | 43.08 kB | Adobe PDF | View/Open |
02_prelim.pdf | 252.55 kB | Adobe PDF | View/Open | |
03_content.pdf | 126.27 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 120.09 kB | Adobe PDF | View/Open | |
05_chapter1- introduction.pdf | 751.96 kB | Adobe PDF | View/Open | |
06_chapter 2 literature survey.pdf | 786.5 kB | Adobe PDF | View/Open | |
07_chapter 3 methodology.pdf | 554.62 kB | Adobe PDF | View/Open | |
08_chapter 4 result and discussion.pdf | 421.99 kB | Adobe PDF | View/Open | |
09_chapter 5 conclusion and future work.pdf | 216.71 kB | Adobe PDF | View/Open | |
10_annexures.pdf | 1.85 MB | Adobe PDF | View/Open | |
80_recommendation.pdf | 242.28 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: