Please use this identifier to cite or link to this item:
Title: Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets
Researcher: Sarma, T Hitendra
Guide(s): Viswanath P
Reddy, B Eswara
Keywords: Kernel k-means clustering methods
Speed-Up K-Means
Upload Date: 11-Dec-2013
University: Jawaharlal Nehru Technological University, Anantapuram
Completed Date: 05/07/2012
Abstract: Data clustering is an unsupervised learning activity which is a process of finding natural groups (clusters) present in the given dataset (i.e., the given set of patterns). It has several applications, like image segmentation, video analysis, bio-informatics, intrusion detection, outlier detection, etc. New application domains and amassed data poses new challenges in the area of data clustering. Different types of data clustering methods have been evolved to cater these upcoming challenges. Among the existing clustering methods, the simplest and efficient clustering method is the k-means clustering method. It has been shown to produce good clustering results in various applications. The time complexity of the k-means method linearly grows with respect to the size of the dataset. In the iterative process of the k-means method, the entire dataset has to be scanned once in each iteration, which is a time consuming process in case of large datasets. Hence, the k-means method is not a suitable one to work with large datasets which do not fit in the main memory. Further, the method fails in identifying non-convex shaped and linearly inseparable clusters in the input space. The kernel k-means clustering method is a nonlinear extension of the k-means method. By implicitly mapping data points to a higher-dimensional feature space (induced space)using a non linear transformation, the kernel k-means method can discover clusters that are linearly inseparable in the input space. But, the time complexity of this method grows equadratically with respect to the size of the dataset. Hence, the kernel k-means clustering method is also not a suitable one for large datasets. The present thesis is about speeding-up he k-means and kernel k-means clustering methods to work with large datasets. In order to speed-up the k-means method, the thesis proposes two prototype based hybrid approaches, which give the same result as that obtained by using the conventional kmeans method.
Pagination: 172p.
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File38.76 kBAdobe PDFView/Open
02_certificate.pdf25.21 kBAdobe PDFView/Open
03_acknowledgements.pdf15.23 kBAdobe PDFView/Open
04_contents.pdf34.75 kBAdobe PDFView/Open
05_abstract.pdf22.67 kBAdobe PDFView/Open
06_list of tables and figures.pdf30.78 kBAdobe PDFView/Open
07_chapter 1.pdf343.88 kBAdobe PDFView/Open
08_chapter 2.pdf190.66 kBAdobe PDFView/Open
09_chapter 3.pdf131.85 kBAdobe PDFView/Open
10_chapter 4.pdf192.67 kBAdobe PDFView/Open
11_chapter 5.pdf299.02 kBAdobe PDFView/Open
12_chapter 6.pdf33.86 kBAdobe PDFView/Open
13_references.pdf115.03 kBAdobe PDFView/Open

Items in Shodhganga are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: