Efficient detection and interpretation of clusters in high dimensional databases

Mittal, Mamta

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/229542

Title:	Efficient detection and interpretation of clusters in high dimensional databases
Researcher:	Mittal, Mamta
Guide(s):	Sharma, R. K. and Singh, V. P.
Keywords:	Cluster validation measures Computer Science K-means Partition based clustering Single pass clustering
University:	Thapar Institute of Engineering and Technology
Completed Date:	2015
Abstract:	Exponential growth of data resources has necessitated new techniques that can convert it into useful information. Clustering is one of the data mining techniques that investigates these data resources for hidden patterns. Many clustering algorithms are available in literature. This research work emphasizes on partitioning based methods and is an attempt towards developing clustering algorithms that can efficiently detect clusters for high dimensional databases. In partitioning based methods, k-means and single pass clustering are popular clustering algorithms but they have several limitations. To overcome the limitations of these algorithms, a Modified Single Pass Clustering (MSPC) algorithm has been proposed in this work. It revolves around the proposition of a threshold similarity value. This is not a user defined parameter; instead, it is a function of data objects left to be clustered. In our experiments, this threshold similarity value is taken as mean/median of the paired distance of all data objects left to be clustered. To assess the performance of MSPC algorithm, five experiments for k-means, SPC and MSPC algorithms have been carried out on artificial and real datasets. Further, a deterministic algorithm, Adaptive Threshold based Clustering (ATC) has been proposed. It does not select the data objects randomly; rather, it is based on selecting the farthest data objects. It uses a parameter, neighborhood distance, to cluster the data objects. It is again an adaptive parameter and not specified by the user. Another parameter used in ATC algorithm is the minimum support value which prunes the insignificant clusters. Performance of the ATC algorithm is also assessed on ten artificial and eight real datasets. It has also been compared with existing k-means algorithm. In this research work, new separation and compactness measures have also been proposed. Proposed compactness measures are based on the arithmetic/geometric average of maximum dispersion of data objects along each dimension.
Pagination:	xi, 134p.
URI:	http://hdl.handle.net/10603/229542
Appears in Departments:	Department of Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
file10(publications).pdf	Attached File	47.3 kB	Adobe PDF	View/Open
file1(title).pdf		82.97 kB	Adobe PDF	View/Open
file2(certificate).pdf		259.07 kB	Adobe PDF	View/Open
file3(preliminary pages).pdf		406.19 kB	Adobe PDF	View/Open
file4(chapter 1).pdf		350.28 kB	Adobe PDF	View/Open
file5(chapter 2).pdf		219.17 kB	Adobe PDF	View/Open
file6(chapter 3).pdf		226.42 kB	Adobe PDF	View/Open
file7(chapter 4).pdf		2.61 MB	Adobe PDF	View/Open
file8(chapter 5).pdf		138.37 kB	Adobe PDF	View/Open
file9(chapter 6).pdf		61.24 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET