Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/301891
Title: Investigation on indexing algorithms for big data retrieval
Researcher: Gayathiri N R
Guide(s): Natarajan A M
Keywords: Big data
MongoDB a NoSQL
B-Tree indexing
University: Anna University
Completed Date: 2019
Abstract: Big Data and its implications have received their own recognition in many aspects of which healthcare system emerges as one of the promising sectors The healthcare and biomedical sciences have rapidly become dataintensive as investigators are generating and using large complex high dimensional and diverse domain specific datasets Due to the existence of diversified data formats huge volume and associated uncertainty that exist among the sources of Big Data the task of data retrieval from huge sources plays a vital role Data retrieval is the process of using a query to extract data from the huge sources of data particularly a large database Indexing is considered as one of the important aspect of retrieval system Indexing structures are data structures aiming to reduce comparisons and consequently reduce the search time The proposed approach uses Hash based indexing schemes which vary according to the respective hashing function trying to reduce the search process and minimize the retrieval time Hashing is chosen because it outperforms trees when input is large for example in billions Two popular Big Data platforms such as MongoDB a NoSQL document database and Hadoop a distributed computing framework are used for data storage and fast processing times In the first phase of the work the most used indexes such as B Tree indexing and the Hash indexing are introduced for handling the Big Data In this work B Tree indexing method and the Hash algorithm that is written using java programming language are analyzed in the MongoDB database The time complexity of Hashing Algorithm is O 1 whereas the time complexity for B Tree is Olog n however when the number of records increases there is a gradual increase in the execution time for the B tree indexing. Hashing is efficient when there are more records say in billions whereas B tree works fine for limited records in distributed sharded and unsharded databases
Pagination: xvii,172p.
URI: http://hdl.handle.net/10603/301891
Appears in Departments:Faculty of Information and Communication Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdf.pdfAttached File10.03 kBAdobe PDFView/Open
02_certificates.pdf.pdf827.12 kBAdobe PDFView/Open
03_abstracts.pdf.pdf92.15 kBAdobe PDFView/Open
04_acknowledgements.pdf.pdf5.29 kBAdobe PDFView/Open
05_contents.pdf.pdf178.35 kBAdobe PDFView/Open
06_list_of_tables.pdf.pdf5.11 kBAdobe PDFView/Open
07_list_of_figures.pdf.pdf89.89 kBAdobe PDFView/Open
08_list_of_abbreviations.pdf190.77 kBAdobe PDFView/Open
09_chapter1.pdf.pdf330.76 kBAdobe PDFView/Open
10_chapter2.pdf.pdf199.69 kBAdobe PDFView/Open
11_chapter3.pdf.pdf165.4 kBAdobe PDFView/Open
12_chapter4.pdf.pdf235.55 kBAdobe PDFView/Open
13_chapter5.pdf.pdf624.95 kBAdobe PDFView/Open
14_conclusion.pdf.pdf13.87 kBAdobe PDFView/Open
15_references.pdf.pdf135.06 kBAdobe PDFView/Open
16_list_of_publications.pdf88.27 kBAdobe PDFView/Open
80_recommendation.pdf132.39 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: