DESIGN AND DEVELOPMENT OF EFFICIENT CLUSTERING TECHNIQUES IN DATA MINING

CHADHA ANUPAMA

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/214124

Title:	DESIGN AND DEVELOPMENT OF EFFICIENT CLUSTERING TECHNIQUES IN DATA MINING
Researcher:	CHADHA ANUPAMA
Guide(s):	SURESH KUMAR
Keywords:	DESIGN, DEVELOPMENT, CLUSTERING, DATA MINING
University:	Manav Rachna International University
Completed Date:	2018
Abstract:	Data Mining is a process of drawing out useful patterns or knowledge from the huge data collected in information systems and to use these patterns in taking safe and smart decisions. The predefined methods and algorithms that are used to extract these useful patterns are called Data Mining Techniques. newlineClustering is a data mining technique of dividing the given dataset into groups or clusters such that the objects in one group are more similar to each other than the objects in the other group. Many clustering algorithms have been proposed in the literature. These clustering algorithms are broadly classified into two categories, Hierarchical and Partitional. The newlineK-Means algorithm is one of the commonly used techniques in the Partitional category. newlineK-Means is a simple algorithm known for its speed. The algorithm is inexpensive in terms of computational cost and works well with high dimensional and large datasets. However, there exist some limitations of this algorithm. One major limitation is the requirement to specify a pre-defined value of number of clusters (K) as input. Providing value of K is domain specific. Sometimes it is difficult to predict the number of clusters required in advance as the dataset is unknown or new and in that case inefficient grouping of data may emerge. These limitations of K-Means are carried forward to its extensions K-Modes and K-Prototype. Various extensions of K-Means for numerical, categorical and mixed datasets to overcome the limitation of providing K as input have been proposed in the literature but these algorithms either require some input parameter other than K or they are computationally complex. newlineThe K-Modes, an extension of the K-Means algorithm for categorical data, is an algorithm famous for its simplicity and speed. Since K-Modes algorithm is used for categorical data, Simple Matching Dissimilarity measure is used instead of Euclidean distance and the Modes of clusters are used instead of Means .
Pagination:
URI:	http://hdl.handle.net/10603/214124
Appears in Departments:	Department of Computer Science Engineering

Files in This Item:

File	Description	Size	Format
01_thesis front page.pdf	Attached File	38.16 kB	Adobe PDF	View/Open
02_thesis declaration.pdf		46.76 kB	Adobe PDF	View/Open
03_certificate.pdf		61.52 kB	Adobe PDF	View/Open
04_acknowledgement.pdf		6.62 kB	Adobe PDF	View/Open
05_list of publications.pdf		51.97 kB	Adobe PDF	View/Open
06_abstract.pdf		55.33 kB	Adobe PDF	View/Open
07_table of contents.pdf		31.78 kB	Adobe PDF	View/Open
08_list of tables.pdf		116.63 kB	Adobe PDF	View/Open
09_list of figures.pdf		14.77 kB	Adobe PDF	View/Open
10_chapter1.pdf		548.19 kB	Adobe PDF	View/Open
11_chapter 2.pdf		189.72 kB	Adobe PDF	View/Open
12_chapter 3.pdf		789.16 kB	Adobe PDF	View/Open
13_chapter 4.pdf		624.77 kB	Adobe PDF	View/Open
14_chapter 5.pdf		578.22 kB	Adobe PDF	View/Open
15_chapter 6.pdf		62.98 kB	Adobe PDF	View/Open
16_references.pdf		245.25 kB	Adobe PDF	View/Open
17_appendix a.pdf		226.22 kB	Adobe PDF	View/Open
18_brief profile of scholar.pdf		4.32 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET