Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/229542
Title: Efficient detection and interpretation of clusters in high dimensional databases
Researcher: Mittal, Mamta
Guide(s): Sharma, R. K. and Singh, V. P.
Keywords: Cluster validation measures
Computer Science
K-means
Partition based clustering
Single pass clustering
University: Thapar Institute of Engineering and Technology
Completed Date: 2015
Abstract: Exponential growth of data resources has necessitated new techniques that can convert it into useful information. Clustering is one of the data mining techniques that investigates these data resources for hidden patterns. Many clustering algorithms are available in literature. This research work emphasizes on partitioning based methods and is an attempt towards developing clustering algorithms that can efficiently detect clusters for high dimensional databases. In partitioning based methods, k-means and single pass clustering are popular clustering algorithms but they have several limitations. To overcome the limitations of these algorithms, a Modified Single Pass Clustering (MSPC) algorithm has been proposed in this work. It revolves around the proposition of a threshold similarity value. This is not a user defined parameter; instead, it is a function of data objects left to be clustered. In our experiments, this threshold similarity value is taken as mean/median of the paired distance of all data objects left to be clustered. To assess the performance of MSPC algorithm, five experiments for k-means, SPC and MSPC algorithms have been carried out on artificial and real datasets. Further, a deterministic algorithm, Adaptive Threshold based Clustering (ATC) has been proposed. It does not select the data objects randomly; rather, it is based on selecting the farthest data objects. It uses a parameter, neighborhood distance, to cluster the data objects. It is again an adaptive parameter and not specified by the user. Another parameter used in ATC algorithm is the minimum support value which prunes the insignificant clusters. Performance of the ATC algorithm is also assessed on ten artificial and eight real datasets. It has also been compared with existing k-means algorithm. In this research work, new separation and compactness measures have also been proposed. Proposed compactness measures are based on the arithmetic/geometric average of maximum dispersion of data objects along each dimension.
Pagination: xi, 134p.
URI: http://hdl.handle.net/10603/229542
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
file10(publications).pdfAttached File47.3 kBAdobe PDFView/Open
file1(title).pdf82.97 kBAdobe PDFView/Open
file2(certificate).pdf259.07 kBAdobe PDFView/Open
file3(preliminary pages).pdf406.19 kBAdobe PDFView/Open
file4(chapter 1).pdf350.28 kBAdobe PDFView/Open
file5(chapter 2).pdf219.17 kBAdobe PDFView/Open
file6(chapter 3).pdf226.42 kBAdobe PDFView/Open
file7(chapter 4).pdf2.61 MBAdobe PDFView/Open
file8(chapter 5).pdf138.37 kBAdobe PDFView/Open
file9(chapter 6).pdf61.24 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: