Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/229542
Title: | Efficient detection and interpretation of clusters in high dimensional databases |
Researcher: | Mittal, Mamta |
Guide(s): | Sharma, R. K. and Singh, V. P. |
Keywords: | Cluster validation measures Computer Science K-means Partition based clustering Single pass clustering |
University: | Thapar Institute of Engineering and Technology |
Completed Date: | 2015 |
Abstract: | Exponential growth of data resources has necessitated new techniques that can convert it into useful information. Clustering is one of the data mining techniques that investigates these data resources for hidden patterns. Many clustering algorithms are available in literature. This research work emphasizes on partitioning based methods and is an attempt towards developing clustering algorithms that can efficiently detect clusters for high dimensional databases. In partitioning based methods, k-means and single pass clustering are popular clustering algorithms but they have several limitations. To overcome the limitations of these algorithms, a Modified Single Pass Clustering (MSPC) algorithm has been proposed in this work. It revolves around the proposition of a threshold similarity value. This is not a user defined parameter; instead, it is a function of data objects left to be clustered. In our experiments, this threshold similarity value is taken as mean/median of the paired distance of all data objects left to be clustered. To assess the performance of MSPC algorithm, five experiments for k-means, SPC and MSPC algorithms have been carried out on artificial and real datasets. Further, a deterministic algorithm, Adaptive Threshold based Clustering (ATC) has been proposed. It does not select the data objects randomly; rather, it is based on selecting the farthest data objects. It uses a parameter, neighborhood distance, to cluster the data objects. It is again an adaptive parameter and not specified by the user. Another parameter used in ATC algorithm is the minimum support value which prunes the insignificant clusters. Performance of the ATC algorithm is also assessed on ten artificial and eight real datasets. It has also been compared with existing k-means algorithm. In this research work, new separation and compactness measures have also been proposed. Proposed compactness measures are based on the arithmetic/geometric average of maximum dispersion of data objects along each dimension. |
Pagination: | xi, 134p. |
URI: | http://hdl.handle.net/10603/229542 |
Appears in Departments: | Department of Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
file10(publications).pdf | Attached File | 47.3 kB | Adobe PDF | View/Open |
file1(title).pdf | 82.97 kB | Adobe PDF | View/Open | |
file2(certificate).pdf | 259.07 kB | Adobe PDF | View/Open | |
file3(preliminary pages).pdf | 406.19 kB | Adobe PDF | View/Open | |
file4(chapter 1).pdf | 350.28 kB | Adobe PDF | View/Open | |
file5(chapter 2).pdf | 219.17 kB | Adobe PDF | View/Open | |
file6(chapter 3).pdf | 226.42 kB | Adobe PDF | View/Open | |
file7(chapter 4).pdf | 2.61 MB | Adobe PDF | View/Open | |
file8(chapter 5).pdf | 138.37 kB | Adobe PDF | View/Open | |
file9(chapter 6).pdf | 61.24 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: