Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/214124
Title: | DESIGN AND DEVELOPMENT OF EFFICIENT CLUSTERING TECHNIQUES IN DATA MINING |
Researcher: | CHADHA ANUPAMA |
Guide(s): | SURESH KUMAR |
Keywords: | DESIGN, DEVELOPMENT, CLUSTERING, DATA MINING |
University: | Manav Rachna International University |
Completed Date: | 2018 |
Abstract: | Data Mining is a process of drawing out useful patterns or knowledge from the huge data collected in information systems and to use these patterns in taking safe and smart decisions. The predefined methods and algorithms that are used to extract these useful patterns are called Data Mining Techniques. newlineClustering is a data mining technique of dividing the given dataset into groups or clusters such that the objects in one group are more similar to each other than the objects in the other group. Many clustering algorithms have been proposed in the literature. These clustering algorithms are broadly classified into two categories, Hierarchical and Partitional. The newlineK-Means algorithm is one of the commonly used techniques in the Partitional category. newlineK-Means is a simple algorithm known for its speed. The algorithm is inexpensive in terms of computational cost and works well with high dimensional and large datasets. However, there exist some limitations of this algorithm. One major limitation is the requirement to specify a pre-defined value of number of clusters (K) as input. Providing value of K is domain specific. Sometimes it is difficult to predict the number of clusters required in advance as the dataset is unknown or new and in that case inefficient grouping of data may emerge. These limitations of K-Means are carried forward to its extensions K-Modes and K-Prototype. Various extensions of K-Means for numerical, categorical and mixed datasets to overcome the limitation of providing K as input have been proposed in the literature but these algorithms either require some input parameter other than K or they are computationally complex. newlineThe K-Modes, an extension of the K-Means algorithm for categorical data, is an algorithm famous for its simplicity and speed. Since K-Modes algorithm is used for categorical data, Simple Matching Dissimilarity measure is used instead of Euclidean distance and the Modes of clusters are used instead of Means . |
Pagination: | |
URI: | http://hdl.handle.net/10603/214124 |
Appears in Departments: | Department of Computer Science Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_thesis front page.pdf | Attached File | 38.16 kB | Adobe PDF | View/Open |
02_thesis declaration.pdf | 46.76 kB | Adobe PDF | View/Open | |
03_certificate.pdf | 61.52 kB | Adobe PDF | View/Open | |
04_acknowledgement.pdf | 6.62 kB | Adobe PDF | View/Open | |
05_list of publications.pdf | 51.97 kB | Adobe PDF | View/Open | |
06_abstract.pdf | 55.33 kB | Adobe PDF | View/Open | |
07_table of contents.pdf | 31.78 kB | Adobe PDF | View/Open | |
08_list of tables.pdf | 116.63 kB | Adobe PDF | View/Open | |
09_list of figures.pdf | 14.77 kB | Adobe PDF | View/Open | |
10_chapter1.pdf | 548.19 kB | Adobe PDF | View/Open | |
11_chapter 2.pdf | 189.72 kB | Adobe PDF | View/Open | |
12_chapter 3.pdf | 789.16 kB | Adobe PDF | View/Open | |
13_chapter 4.pdf | 624.77 kB | Adobe PDF | View/Open | |
14_chapter 5.pdf | 578.22 kB | Adobe PDF | View/Open | |
15_chapter 6.pdf | 62.98 kB | Adobe PDF | View/Open | |
16_references.pdf | 245.25 kB | Adobe PDF | View/Open | |
17_appendix a.pdf | 226.22 kB | Adobe PDF | View/Open | |
18_brief profile of scholar.pdf | 4.32 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: