Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/13964
Title: | Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets |
Researcher: | Sarma, T Hitendra |
Guide(s): | Viswanath P Reddy, B Eswara |
Keywords: | Kernel k-means clustering methods Speed-Up K-Means |
Upload Date: | 11-Dec-2013 |
University: | Jawaharlal Nehru Technological University, Anantapuram |
Completed Date: | 05/07/2012 |
Abstract: | Data clustering is an unsupervised learning activity which is a process of finding natural groups (clusters) present in the given dataset (i.e., the given set of patterns). It has several applications, like image segmentation, video analysis, bio-informatics, intrusion detection, outlier detection, etc. New application domains and amassed data poses new challenges in the area of data clustering. Different types of data clustering methods have been evolved to cater these upcoming challenges. Among the existing clustering methods, the simplest and efficient clustering method is the k-means clustering method. It has been shown to produce good clustering results in various applications. The time complexity of the k-means method linearly grows with respect to the size of the dataset. In the iterative process of the k-means method, the entire dataset has to be scanned once in each iteration, which is a time consuming process in case of large datasets. Hence, the k-means method is not a suitable one to work with large datasets which do not fit in the main memory. Further, the method fails in identifying non-convex shaped and linearly inseparable clusters in the input space. The kernel k-means clustering method is a nonlinear extension of the k-means method. By implicitly mapping data points to a higher-dimensional feature space (induced space)using a non linear transformation, the kernel k-means method can discover clusters that are linearly inseparable in the input space. But, the time complexity of this method grows equadratically with respect to the size of the dataset. Hence, the kernel k-means clustering method is also not a suitable one for large datasets. The present thesis is about speeding-up he k-means and kernel k-means clustering methods to work with large datasets. In order to speed-up the k-means method, the thesis proposes two prototype based hybrid approaches, which give the same result as that obtained by using the conventional kmeans method. |
Pagination: | 172p. |
URI: | http://hdl.handle.net/10603/13964 |
Appears in Departments: | Department of Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 38.76 kB | Adobe PDF | View/Open |
02_certificate.pdf | 25.21 kB | Adobe PDF | View/Open | |
03_acknowledgements.pdf | 15.23 kB | Adobe PDF | View/Open | |
04_contents.pdf | 34.75 kB | Adobe PDF | View/Open | |
05_abstract.pdf | 22.67 kB | Adobe PDF | View/Open | |
06_list of tables and figures.pdf | 30.78 kB | Adobe PDF | View/Open | |
07_chapter 1.pdf | 343.88 kB | Adobe PDF | View/Open | |
08_chapter 2.pdf | 190.66 kB | Adobe PDF | View/Open | |
09_chapter 3.pdf | 131.85 kB | Adobe PDF | View/Open | |
10_chapter 4.pdf | 192.67 kB | Adobe PDF | View/Open | |
11_chapter 5.pdf | 299.02 kB | Adobe PDF | View/Open | |
12_chapter 6.pdf | 33.86 kB | Adobe PDF | View/Open | |
13_references.pdf | 115.03 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: