Some Investigations on Clustering of Incomplete Data and its Applications

Goel Sonia

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/456650

Title:	Some Investigations on Clustering of Incomplete Data and its Applications
Researcher:	Goel Sonia
Guide(s):	Tushir Meena
Keywords:	Engineering Engineering and Technology Engineering Electrical and Electronic
University:	Guru Gobind Singh Indraprastha University
Completed Date:	2022
Abstract:	Clustering is one of the significant and effectively applied algorithms for automatic knowledge extraction from a data set with a lot of information. Its purpose is to recognize patterns of identical features within a data set. Clustering strategies are utilized in numerous areas, including database marketing, information retrieval, bioinformatics, medical sciences, and several others. When clustering strategies are applied to real data sets, an issue that regularly comes up is that missing features happen in the real-world data sets. Since conventional clustering strategies were created to examine complete data, there is a requirement for data clustering techniques to deal with incomplete data. In this work, we researched the impact of a few imputation and non- imputation based approaches on the clustering of incomplete data. We present a linear interpolation-based imputation along with a modified fuzzy c-means (FCM) clustering algorithm to cluster incomplete data. Experimental results on various data sets show that the linear interpolation-based FCM clustering performs significantly better than other imputation as well as non-imputation techniques. Traditional FCM clustering algorithms generally use Euclidean distance to measure the dissimilarity between objects to find clusters with convex shapes. But FCM clustering algorithm is not suitable for irregularly shaped clusters, and it is also more sensitive to the presence of noise and isolated points. In a noisy environment, clustering algorithm methods with Euclidean measures are not stable enough, and they are sensitive to the initial value of the algorithm i.e. the shape of the cluster, and the size of the cluster. These problems can be partially controlled by changing the measurement method. Mahalanobis distance is widely used to deal with this situation. In the second part of the thesis, we modified the clustering technique by incorporating the Mahalanobis distance metric for clustering of incomplete data. A fuzzy clustering algorithm based...
Pagination:	155p.
URI:	http://hdl.handle.net/10603/456650
Appears in Departments:	University School of Information and Communication Technology

Files in This Item:

File	Description	Size	Format
80_recommendation.pdf	Attached File	217.74 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET