Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/287086
Title: An Efficient In Line Data Deduplication Method for Cloud Services
Researcher: Venish A
Guide(s): Siva Sankar K
Keywords: Engineering and Technology,Computer Science,Computer Science Information Systems
University: Noorul Islam Centre for Higher Education
Completed Date: 18/05/2018
Abstract: ABSTRACT newlineDigital data growth in all cloud models need special and extra care to provide seamless data newlineprocess. Data handling process requires more storage space, high cost, manual intervene for newlinethe data handling and disaster recovery. This data transfer utilize more network bandwidth newlineand could hurt the system performance. To resolve this challenges the deduplication concept newlineremoves the duplicate data and stores only the unique data which reduces the data handling newlineoverhead. newlineReplacement algorithm plays a vital role in fingerprint management in all the deduplication newlinesystems. This algorithm is a decision maker on how and which set of element has to newlinebe replaced from main memory to Metadata disk. Memory buffer management is handled newlineby Least Recently Used (LRU) replacement algorithm in many of the current data center newlinededuplication systems. The LRU algorithm produces good result in strong data pattern and newlinethe implementation of algorithm is easy compared to other replacement algorithms. However, newlineLRU fails to produce good result when it works with weak data pattern and it requires newlinehuge Input/Output (I/O) resource access for Metadata search. This issue can be addressed newlineby Low Inter-reference Recency Set (LIRS) algorithm and it is the first time ever in data newlinededuplication. newlineFirst, the system proposes a new architecture that incorporates with the Metadata manager newlineto get the maximum Metadata Hit and reduces the cache Miss penalty. It handles the newlineindexing table efficiently and speeds up the lookup process by using B-Tree. Second, it newlineimplements the heuristic based replacement algorithm for the cache replacement policy to newlineachieve the maximum throughput for the weak locality data pattern. Last, as an extensive newlineapproach, the Data Relationship Manager (DRM) is integrated with replacement algorithm newlineto reduce the widespread search in the Metadata disk. This DRM knows the location of each newlineelement which is stored in Metadata disk. newlineFurther the works extensively analyses LRU and LIRS replacement algorithm to verify newlinethe result of the different data pattern like weak and strong. Also analyses the Metadata newlinecaching space by configuring three set of Inter-reference Recency (IRR) value. To evaluate newlinethe system performance and achieve the system behavior, it is tested with real world data. It newlineconfirms the system performs well with different data pattern especially weak locality data newlineii newlinepattern, reduces the Metadata lookup overhead and reduce the I/O resource utilization. The newlineresult shows 30% increase in the Hit ratio, 25% improvement in the total time taken for the newlinededuplication and 15% time reduction when changing the cache size of the deduplication newlinesystem. newlineThe testing data sample contains 4000 files in which each file is about 256KB. To newlinecompare the performance between the mentioned algorithms, I have used four set of results newlinewith five different data pattern. The result clearly indicates that the LIRS algorithm is better newlinefor weak locality workload than the other LRU algorithm. But it has one disadvantage over newlineLRU that it has high computational complexity. However, the final time taken by LIRS for newlinethe entire deduplication process is way lesser than the LRU. The major advantage is that newlinethe memory buffer management in LIRS is efficient than the other and compared to LRU newlinealgorithm it has less disk access to search the Metadata. newlineThe LIRS with DRM implementation also provides good result in the deduplication newlinesystem. Whenever, the element search is Miss in LIRS history tree and this DRM helps newlineto identify the disk location of the Metadata. This DRM is integrated with deduplication newlinesystem and it is compared with LIRS. The workload, time and Double Data Rate (DDR) size newlineanalysis with LIRS-DRM improves the search result and reduces the Metadata disk access newlinesignificantly. The experimental results also confirms the 15% performance improvement newlinewhen the DRM integrates with LIRS. newlineThe computational complexity of the system was mainly in the hash searching mech newline
Pagination: 120
URI: http://hdl.handle.net/10603/287086
Appears in Departments:Department of Computer Science and Engineering

Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: