Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/250807
Full metadata record
DC FieldValueLanguage
dc.coverage.spatialEfficient Hybrid Distributed Document Clustering Method for Large Datasets
dc.date.accessioned2019-07-16T10:37:25Z-
dc.date.available2019-07-16T10:37:25Z-
dc.identifier.urihttp://hdl.handle.net/10603/250807-
dc.description.abstractABSTRACT newlineThe growing volume of data to be analyzed enforces novel challenges to the data mining methodologies. Conventional data mining techniques such as clustering assume centralized operation on data and are computationally expensive in terms of execution time. Clustering of large datasets has received considerable attention in the past few decades in several application areas like document categorization and retrieval. newlineThis thesis deals with improving the performance of clustering technique for large high-dimensional distributed document datasets. The challenges addressed are the initial centroids problem and dimensionality problem. These challenges are addressed with an emerging Hadoop-MapReduce model for distributed storage and analysis. This methodology supports processing of large document datasets and proposes solutions for the challenges described by developing distributed clustering algorithms based on this methodology. This thesis proposes three different methods for distributed clustering namely, MapReduce KMeans (MR-KMeans) based distributed document clustering, Distributed document clustering method based on MapReduce PSO-KMeans (MR-PKMeans) and a Hybrid distributed document clustering method (MR-Hybrid). Intensive evaluations are performed resulting in optimized and semantically related document clusters with high quality and speedup. newlineIn the MapReduce K-Means (MR-KMeans) based distributed document clustering method, the algorithm is modeled with an efficient similarity measure using Hadoop framework with the main objective of improving the clustering quality and speedup of localized clustering solution. This method utilizes random initial centroids that converge the result to generate locally optimized clusters. The different stages of clustering process such as similarity calculation, assignment of document to clusters, and recalculation of new cluster centroids are all based on MapReduce methodology. Results on large document datasets show that such a framework with an efficient method of determ
dc.format.extent151
dc.languageEnglish
dc.relation135
dc.rightsuniversity
dc.titleEfficient Hybrid Distributed Document Clustering Method for Large Datasets
dc.title.alternative-
dc.creator.researcherJudith J.E
dc.subject.keywordEngineering and Technology,Computer Science,Computer Science Artificial Intelligence
dc.description.noteEfficient Hybrid, Distributed Document,Clustering Method ,Large Datasets
dc.contributor.guideJayakumari J
dc.publisher.placeKanyakumari
dc.publisher.universityNoorul Islam Centre for Higher Education
dc.publisher.institutionDepartment of Computer Science and Engineering
dc.date.registered02/08/2010
dc.date.completed10/09/2015
dc.date.awarded31/08/2016
dc.format.dimensionsA4
dc.format.accompanyingmaterialDVD
dc.source.universityUniversity
dc.type.degreePh.D.
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
1 front.pdfAttached File183.2 kBAdobe PDFView/Open
3 bonafide certificate.pdf123.26 kBAdobe PDFView/Open
5 acknowledgements.pdf83.39 kBAdobe PDFView/Open
6 table of contents.pdf107.39 kBAdobe PDFView/Open
7 list of tables.pdf162.36 kBAdobe PDFView/Open
8 list of figures.pdf101.75 kBAdobe PDFView/Open
9 list of abbreviations.pdf87.67 kBAdobe PDFView/Open
chapter iii.pdf521.27 kBAdobe PDFView/Open
chapter ii.pdf234.33 kBAdobe PDFView/Open
chapter i.pdf116.23 kBAdobe PDFView/Open
chapter iv.pdf340.27 kBAdobe PDFView/Open
chapter vii.pdf2.43 MBAdobe PDFView/Open
chapter vi.pdf693.33 kBAdobe PDFView/Open
chapter v.pdf413.65 kBAdobe PDFView/Open
references.pdf106.59 kBAdobe PDFView/Open


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: