Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/366490
Title: An Investigation of Machine Learning Algorithm for Clustering
Researcher: Shrivastava, Shailendra Kumar
Guide(s): Jain,R.C. and Rana, J.L.
Keywords: Computer Science
Computer Science Software Engineering
Engineering and Technology
University: Rajiv Gandhi Proudyogiki Vishwavidyalaya
Completed Date: 2013
Abstract: Clustering is an unsupervised learning method for finding the groups in given data set. newlineThe task of clustering is NP hard. Generally, problem of clustering the data set newlinerequires the machine learning techniques. Clustering builds much interest in the newlinemachine learning community. One of the latest concepts to find the clusters is affinity newlinepropagation. Affinity propagation concept is based on exemplar. The input in affinity newlinepropagation method is similarities among data points. The output of this method is a newlineset of representative data points that best describes the clusters. These representative newlinedata points are known as exemplar and assignments of all non-exemplar data points to newlineits nearby exemplar are to generate the clusters. newlineIn this thesis, we have developed four algorithms based on affinity propagation and newlinemachine learning concepts. Extensive experiments have been carried out to evaluate newlinethe performance of these algorithms. The names of algorithm are Fast Affinity newlinePropagation based on machine learning (FAPML), Phrase affinity clustering (PAC), newlineK-means based on Heterogeneous Transfer Learning (K-Means based on HTL) and newlineAffinity Propagation based on Heterogeneous Transfer Learning (AP based on HTL). newlineThe FAMPL is based on Learning by experience which is the principle of machine newlinelearning. FAPML tries to put data points into clusters based on the history of the data newlinepoints belonging to clusters in early stages. In FAPML we have introduced affinity newlinelearning constant and dispersion constant which supervise the clustering process. newlineFAPML also enforces the exemplar consistency and one of N constraints. newlinePAC first finds the phrase by Ukkonen suffix tree construction algorithm, then it finds newlinethe vector space model using tf-idf weighting scheme of phrase. After that it newlinecalculates the similarity matrix form VSD using cosine similarity. Affinity newlinepropagation is used to generate the clusters. newlineIn the K-Means output clusters depends on initialization of centroids. K-Means newlinebased on HTL tries to solve this problem.
Pagination: 17.1MB
URI: http://hdl.handle.net/10603/366490
Appears in Departments:Computer Science Engineering

Files in This Item:
File Description SizeFormat 
01 _ title.pdfAttached File661.84 kBAdobe PDFView/Open
03 _ tables of contents.pdf265.73 kBAdobe PDFView/Open
04 _list of tables.pdf179.78 kBAdobe PDFView/Open
05_ list of figures.pdf963.83 kBAdobe PDFView/Open
06 _ acknowledgements.pdf167.85 kBAdobe PDFView/Open
07 _chapter 1.pdf266.38 kBAdobe PDFView/Open
08 _chapter 2.pdf1.1 MBAdobe PDFView/Open
09 _chapter 3.pdf1.17 MBAdobe PDFView/Open
10 _ a chapter 5.pdf1.4 MBAdobe PDFView/Open
10 _ b chapter 6.pdf1.21 MBAdobe PDFView/Open
10 _ c chapter 7.pdf1.22 MBAdobe PDFView/Open
10 _chapter 4.pdf1.3 MBAdobe PDFView/Open
10 _ d chapter 8.pdf246.3 kBAdobe PDFView/Open
11 _ references.pdf161.3 kBAdobe PDFView/Open
12 _ list of publications.pdf4.78 MBAdobe PDFView/Open
80_recommendation.pdf187.69 kBAdobe PDFView/Open
abstract.pdf187.69 kBAdobe PDFView/Open
certificate.pdf782.31 kBAdobe PDFView/Open
declaration by the candidate.pdf429.33 kBAdobe PDFView/Open
list of abbreviations.pdf173.92 kBAdobe PDFView/Open
preliminary page.pdf661.84 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: