Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/442421
Title: Framework to improve performance of hadoop
Researcher: Balraj Singh
Guide(s): Harsh K Verma and H S Johal
Keywords: Computer Science
Computer Science Hardware and Architecture
Engineering and Technology
University: Dr B R Ambedkar National Institute of Technology Jalandhar
Completed Date: 2022
Abstract: newlinev newlineABSTRACT newlineThe present era demands continuous support to bring improvements in executing newlinelarge scale data and to work beyond the traditional systems. The need for processing newlinediverse data types and solutions for different domains of the industry is rising. Such newlineneeds increase the requirement for sophisticated techniques and methods to enhance newlinethe existing platforms and mechanisms further. This provides an opportunity for the newlineresearch community to investigate further into the present systems, to find out newlinepotential issues, and propose new ways to improve the current systems. newlineHadoop is a popular choice to manage and process Big data. It is an open-source newlineplatform and a front runner in the batch processing of large scale jobs. The economy newlineassociated with the cluster in scaling is low as compared to other platforms. However, newlinethis popularity by no means guarantees high performance in all scenarios. With the newlinecontinuous evolution in data development and industrial requirements, it is imperative newlineto investigate and look into the new methods and techniques to bring advancements in newlinethe existing system. The performance of a cluster is largely dependent upon the newlinedifferent job processing mechanisms and the policies associated with it. While newlineextensive studies and solutions are proposed, the performance bottlenecks in terms of newlinescheduling, load balancing, and content management still prevail. The performance newlinechallenges are due to the complex nature of the existing system and their limited newlineabilities to understand the diverse and changing needs of the jobs. The key issues to newlinebe addressed are scheduling, skew mitigation through load balancing, and efficient newlinedata splitting and merging. Not much of the solutions are there on scheduling newlineconcerning the trade-off between the different parameters. The process of content newlinesplitting and merging is not explored to a large extent. The skew mitigation solutions newlineare more focused on Reduce side of the MapReduce, while the Map side is not newlineutilized much for load balancing. newlineThis thesis, a
Pagination: 
URI: http://hdl.handle.net/10603/442421
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
80_recommendation.pdfAttached File79.11 kBAdobe PDFView/Open
abstract.pdf85.72 kBAdobe PDFView/Open
bibliography.pdf221.07 kBAdobe PDFView/Open
chapter 1.pdf558.16 kBAdobe PDFView/Open
chapter 2.pdf321.67 kBAdobe PDFView/Open
chapter 3.pdf697.55 kBAdobe PDFView/Open
chapter 4.pdf925.67 kBAdobe PDFView/Open
chapter 5.pdf406.52 kBAdobe PDFView/Open
prelim.pdf909.75 kBAdobe PDFView/Open
table of contents.pdf86.03 kBAdobe PDFView/Open
title.pdf85.79 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: