Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/366490
Title: | An Investigation of Machine Learning Algorithm for Clustering |
Researcher: | Shrivastava, Shailendra Kumar |
Guide(s): | Jain,R.C. and Rana, J.L. |
Keywords: | Computer Science Computer Science Software Engineering Engineering and Technology |
University: | Rajiv Gandhi Proudyogiki Vishwavidyalaya |
Completed Date: | 2013 |
Abstract: | Clustering is an unsupervised learning method for finding the groups in given data set. newlineThe task of clustering is NP hard. Generally, problem of clustering the data set newlinerequires the machine learning techniques. Clustering builds much interest in the newlinemachine learning community. One of the latest concepts to find the clusters is affinity newlinepropagation. Affinity propagation concept is based on exemplar. The input in affinity newlinepropagation method is similarities among data points. The output of this method is a newlineset of representative data points that best describes the clusters. These representative newlinedata points are known as exemplar and assignments of all non-exemplar data points to newlineits nearby exemplar are to generate the clusters. newlineIn this thesis, we have developed four algorithms based on affinity propagation and newlinemachine learning concepts. Extensive experiments have been carried out to evaluate newlinethe performance of these algorithms. The names of algorithm are Fast Affinity newlinePropagation based on machine learning (FAPML), Phrase affinity clustering (PAC), newlineK-means based on Heterogeneous Transfer Learning (K-Means based on HTL) and newlineAffinity Propagation based on Heterogeneous Transfer Learning (AP based on HTL). newlineThe FAMPL is based on Learning by experience which is the principle of machine newlinelearning. FAPML tries to put data points into clusters based on the history of the data newlinepoints belonging to clusters in early stages. In FAPML we have introduced affinity newlinelearning constant and dispersion constant which supervise the clustering process. newlineFAPML also enforces the exemplar consistency and one of N constraints. newlinePAC first finds the phrase by Ukkonen suffix tree construction algorithm, then it finds newlinethe vector space model using tf-idf weighting scheme of phrase. After that it newlinecalculates the similarity matrix form VSD using cosine similarity. Affinity newlinepropagation is used to generate the clusters. newlineIn the K-Means output clusters depends on initialization of centroids. K-Means newlinebased on HTL tries to solve this problem. |
Pagination: | 17.1MB |
URI: | http://hdl.handle.net/10603/366490 |
Appears in Departments: | Computer Science Engineering |
Files in This Item:
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: