Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/305313
Title: Efficient Data Deduplication for Big Data Storage System
Researcher: NARESH KUMAR
Guide(s): S.C JAIN
Keywords: Computer Science
Computer Science Cybernetics
Engineering and Technology
University: Rajasthan Technical University, Kota
Completed Date: 2018
Abstract: In this thesis research work, prime focus is optimizing the deduplication system by adjusting newlinepertinent factors in content defined chunking (CDC) to identify as key ingredients by newlinedeclaring chunk cut-points and efficient fingerprint lookup using on-disk secure bucket based newlineindex partitioning. Firstly, Differential Evolution (DE) algorithm based efficient chunking newlineapproach is proposed to optimize Two Thresholds Two Divisors (TTTD) CDC known as newlineTTTD-P; where significantly it reduces the number of computing operations by using single newlinedynamic optimal parameter divisor D with optimal threshold value T by exploiting the multioperations newlinenature of TTTD. To reduce the chunk-size variance, TTTD algorithm introduces newlinean additional backup divisor D` that has a higher probability of finding cut-points. However, newlineadding an additional divisor D` decreases the chunking throughput, meaning that TTTD newlinealgorithm aggravates Rabin s CDC performance bottleneck. To this end, Asymmetric newlineExtremum (AE) CDC significantly improves chunking throughput while providing newlinecomparable deduplication efficiency by using the local extreme value in a variable-sized newlineasymmetric window to overcome Rabin CDC and TTTD chunking problem. After AE, an newlineefficient FastCDC approach is developed using fast gear-based hashing. Therefore, AE and newlineFastCDC approaches increase the chunking throughput only, but suffers with the problem of newlinededuplication ratio (DR) for enhancing storage space as being the prime objective of today s newlinebig data storage systems using Hadoop technology in cloud computing to accommodate newlinemassive volume of data by eliminating redundant data maximally. Secondly, fingerprint newlinegeneration stage of data deduplication uses cryptographic secure hash function SHA-1 to newlinesecure big data storage using key-value store. The key is a fingerprint and the value points to newlinethe data chunk. Moreover, deduplication technology is also facing technical challenges for newlineduplicate-lookup disk bottleneck to store complete index of data chunks with their newlinefingerprints.
Pagination: 5094
URI: http://hdl.handle.net/10603/305313
Appears in Departments:Computer Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File104.85 kBAdobe PDFView/Open
02_certificate.pdf1.87 MBAdobe PDFView/Open
03_preliminary pages.pdf400.2 kBAdobe PDFView/Open
04_chapter01.pdf583.66 kBAdobe PDFView/Open
04_chapter02.pdf1.75 MBAdobe PDFView/Open
04_chapter03.pdf1.13 MBAdobe PDFView/Open
04_chapter04.pdf832.37 kBAdobe PDFView/Open
04_chapter05.pdf86.3 kBAdobe PDFView/Open
04_chapter06.pdf421.05 kBAdobe PDFView/Open
80_recommendation.pdf189.84 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: