A Novel Technique for Efficient Storage and Retrieval of Massive Data Sets

Singh, Amritpal

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/303952

Title:	A Novel Technique for Efficient Storage and Retrieval of Massive Data Sets
Researcher:	Singh, Amritpal
Guide(s):	Batra, Shalini
Keywords:	Bloom Filter Probabilistic data structures Quotient Filter
University:	Thapar Institute of Engineering and Technology
Completed Date:	2018
Abstract:	In today s world data is considered as one of the most valuable assets. With the coming up of plethora of web applications and technologies like sensors, IoT, cloud computing, etc., the in-stream data generation resources have increased exponentially. Data originating from heterogeneous sources and real world applications is severely susceptible to inconsistent, incomplete and noisy data. To support data applications in different domains, data processing must be efficient and automated as much as possible. Further, timely and accurate analysis of available data is an intrinsic requirement. Conventional databases and traditional data mining techniques are efficient for stored data analytic but for in-streamed data, where data is arriving continuously, it is not feasible to store the data into databases and then perform analysis since all such applications demand time bound query output. Moreover, traditional approaches demand that entire data should be stored in a formatted manner. Massive datasets require architectures and tools for data storage, handling, processing and mining of the bulk information in limited time and in single pass. One of the available alternative is use of Probabilistic Data Structures (PDS) in Big data analytics, which use some probability based approaches, approximation principals and hashing methods to reduce time and space trade off in storage, retrieval and search of data. This thesis proposes three techniques for streamed data analysis. First one, a variant of scalable Bloom Filter (BF), called AdapTable Bloom Filter (ATBF), performs peak hour analysis and decides the size of dynamic BF apriori using Kalman filter and Learning Array (LA). In second approach, a variant of stable BF, called FingerPrint Stable Bloom Filter(FPSBF), has been proposed for duplicate detection in streamed data. In the third approach, a semi-supervised technique for spam detection in Twitter has been proposed which employs ensemble based framework (Eb-SDF) comprising of four classifiers.
Pagination:	146p.
URI:	http://hdl.handle.net/10603/303952
Appears in Departments:	Department of Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	90.84 kB	Adobe PDF	View/Open
02_contents.pdf		66.54 kB	Adobe PDF	View/Open
03_list of figures.pdf		52.35 kB	Adobe PDF	View/Open
04_list of tables.pdf		47.61 kB	Adobe PDF	View/Open
05_list of algorithms.pdf		47.84 kB	Adobe PDF	View/Open
06_list of abbreviations.pdf		46.93 kB	Adobe PDF	View/Open
07_certificate.pdf		504.24 kB	Adobe PDF	View/Open
08_acknowledgements.pdf		60.93 kB	Adobe PDF	View/Open
09_abstract.pdf		61.44 kB	Adobe PDF	View/Open
10_chapter 1.pdf		1.87 MB	Adobe PDF	View/Open
11_chapter 2.pdf		1.18 MB	Adobe PDF	View/Open
12_chapter 3.pdf		807.45 kB	Adobe PDF	View/Open
13_chapter 4.pdf		2.44 MB	Adobe PDF	View/Open
14_chapter 5.pdf		3.53 MB	Adobe PDF	View/Open
15_bibliography.pdf		133.95 kB	Adobe PDF	View/Open
16_list of publications.pdf		65.87 kB	Adobe PDF	View/Open
80_recommendation.pdf		130.57 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET