Investigation on indexing algorithms for big data retrieval

Gayathiri N R

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/301891

Title:	Investigation on indexing algorithms for big data retrieval
Researcher:	Gayathiri N R
Guide(s):	Natarajan A M
Keywords:	Big data MongoDB a NoSQL B-Tree indexing
University:	Anna University
Completed Date:	2019
Abstract:	Big Data and its implications have received their own recognition in many aspects of which healthcare system emerges as one of the promising sectors The healthcare and biomedical sciences have rapidly become dataintensive as investigators are generating and using large complex high dimensional and diverse domain specific datasets Due to the existence of diversified data formats huge volume and associated uncertainty that exist among the sources of Big Data the task of data retrieval from huge sources plays a vital role Data retrieval is the process of using a query to extract data from the huge sources of data particularly a large database Indexing is considered as one of the important aspect of retrieval system Indexing structures are data structures aiming to reduce comparisons and consequently reduce the search time The proposed approach uses Hash based indexing schemes which vary according to the respective hashing function trying to reduce the search process and minimize the retrieval time Hashing is chosen because it outperforms trees when input is large for example in billions Two popular Big Data platforms such as MongoDB a NoSQL document database and Hadoop a distributed computing framework are used for data storage and fast processing times In the first phase of the work the most used indexes such as B Tree indexing and the Hash indexing are introduced for handling the Big Data In this work B Tree indexing method and the Hash algorithm that is written using java programming language are analyzed in the MongoDB database The time complexity of Hashing Algorithm is O 1 whereas the time complexity for B Tree is Olog n however when the number of records increases there is a gradual increase in the execution time for the B tree indexing. Hashing is efficient when there are more records say in billions whereas B tree works fine for limited records in distributed sharded and unsharded databases
Pagination:	xvii,172p.
URI:	http://hdl.handle.net/10603/301891
Appears in Departments:	Faculty of Information and Communication Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf.pdf	Attached File	10.03 kB	Adobe PDF	View/Open
02_certificates.pdf.pdf		827.12 kB	Adobe PDF	View/Open
03_abstracts.pdf.pdf		92.15 kB	Adobe PDF	View/Open
04_acknowledgements.pdf.pdf		5.29 kB	Adobe PDF	View/Open
05_contents.pdf.pdf		178.35 kB	Adobe PDF	View/Open
06_list_of_tables.pdf.pdf		5.11 kB	Adobe PDF	View/Open
07_list_of_figures.pdf.pdf		89.89 kB	Adobe PDF	View/Open
08_list_of_abbreviations.pdf		190.77 kB	Adobe PDF	View/Open
09_chapter1.pdf.pdf		330.76 kB	Adobe PDF	View/Open
10_chapter2.pdf.pdf		199.69 kB	Adobe PDF	View/Open
11_chapter3.pdf.pdf		165.4 kB	Adobe PDF	View/Open
12_chapter4.pdf.pdf		235.55 kB	Adobe PDF	View/Open
13_chapter5.pdf.pdf		624.95 kB	Adobe PDF	View/Open
14_conclusion.pdf.pdf		13.87 kB	Adobe PDF	View/Open
15_references.pdf.pdf		135.06 kB	Adobe PDF	View/Open
16_list_of_publications.pdf		88.27 kB	Adobe PDF	View/Open
80_recommendation.pdf		132.39 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET