Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/456650
Title: | Some Investigations on Clustering of Incomplete Data and its Applications |
Researcher: | Goel Sonia |
Guide(s): | Tushir Meena |
Keywords: | Engineering Engineering and Technology Engineering Electrical and Electronic |
University: | Guru Gobind Singh Indraprastha University |
Completed Date: | 2022 |
Abstract: | Clustering is one of the significant and effectively applied algorithms for automatic knowledge extraction from a data set with a lot of information. Its purpose is to recognize patterns of identical features within a data set. Clustering strategies are utilized in numerous areas, including database marketing, information retrieval, bioinformatics, medical sciences, and several others. When clustering strategies are applied to real data sets, an issue that regularly comes up is that missing features happen in the real-world data sets. Since conventional clustering strategies were created to examine complete data, there is a requirement for data clustering techniques to deal with incomplete data. In this work, we researched the impact of a few imputation and non- imputation based approaches on the clustering of incomplete data. We present a linear interpolation-based imputation along with a modified fuzzy c-means (FCM) clustering algorithm to cluster incomplete data. Experimental results on various data sets show that the linear interpolation-based FCM clustering performs significantly better than other imputation as well as non-imputation techniques. Traditional FCM clustering algorithms generally use Euclidean distance to measure the dissimilarity between objects to find clusters with convex shapes. But FCM clustering algorithm is not suitable for irregularly shaped clusters, and it is also more sensitive to the presence of noise and isolated points. In a noisy environment, clustering algorithm methods with Euclidean measures are not stable enough, and they are sensitive to the initial value of the algorithm i.e. the shape of the cluster, and the size of the cluster. These problems can be partially controlled by changing the measurement method. Mahalanobis distance is widely used to deal with this situation. In the second part of the thesis, we modified the clustering technique by incorporating the Mahalanobis distance metric for clustering of incomplete data. A fuzzy clustering algorithm based... |
Pagination: | 155p. |
URI: | http://hdl.handle.net/10603/456650 |
Appears in Departments: | University School of Information and Communication Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
80_recommendation.pdf | Attached File | 217.74 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: