Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets

Sarma, T Hitendra

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/13964

Title:	Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets
Researcher:	Sarma, T Hitendra
Guide(s):	Viswanath P Reddy, B Eswara
Keywords:	Kernel k-means clustering methods Speed-Up K-Means
Upload Date:	11-Dec-2013
University:	Jawaharlal Nehru Technological University, Anantapuram
Completed Date:	05/07/2012
Abstract:	Data clustering is an unsupervised learning activity which is a process of finding natural groups (clusters) present in the given dataset (i.e., the given set of patterns). It has several applications, like image segmentation, video analysis, bio-informatics, intrusion detection, outlier detection, etc. New application domains and amassed data poses new challenges in the area of data clustering. Different types of data clustering methods have been evolved to cater these upcoming challenges. Among the existing clustering methods, the simplest and efficient clustering method is the k-means clustering method. It has been shown to produce good clustering results in various applications. The time complexity of the k-means method linearly grows with respect to the size of the dataset. In the iterative process of the k-means method, the entire dataset has to be scanned once in each iteration, which is a time consuming process in case of large datasets. Hence, the k-means method is not a suitable one to work with large datasets which do not fit in the main memory. Further, the method fails in identifying non-convex shaped and linearly inseparable clusters in the input space. The kernel k-means clustering method is a nonlinear extension of the k-means method. By implicitly mapping data points to a higher-dimensional feature space (induced space)using a non linear transformation, the kernel k-means method can discover clusters that are linearly inseparable in the input space. But, the time complexity of this method grows equadratically with respect to the size of the dataset. Hence, the kernel k-means clustering method is also not a suitable one for large datasets. The present thesis is about speeding-up he k-means and kernel k-means clustering methods to work with large datasets. In order to speed-up the k-means method, the thesis proposes two prototype based hybrid approaches, which give the same result as that obtained by using the conventional kmeans method.
Pagination:	172p.
URI:	http://hdl.handle.net/10603/13964
Appears in Departments:	Department of Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	38.76 kB	Adobe PDF	View/Open
02_certificate.pdf		25.21 kB	Adobe PDF	View/Open
03_acknowledgements.pdf		15.23 kB	Adobe PDF	View/Open
04_contents.pdf		34.75 kB	Adobe PDF	View/Open
05_abstract.pdf		22.67 kB	Adobe PDF	View/Open
06_list of tables and figures.pdf		30.78 kB	Adobe PDF	View/Open
07_chapter 1.pdf		343.88 kB	Adobe PDF	View/Open
08_chapter 2.pdf		190.66 kB	Adobe PDF	View/Open
09_chapter 3.pdf		131.85 kB	Adobe PDF	View/Open
10_chapter 4.pdf		192.67 kB	Adobe PDF	View/Open
11_chapter 5.pdf		299.02 kB	Adobe PDF	View/Open
12_chapter 6.pdf		33.86 kB	Adobe PDF	View/Open
13_references.pdf		115.03 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET