Formulation and analysis of algorithms involved in data mining

TRIPATHY,MURCHHANA

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/356781

Title:	Formulation and analysis of algorithms involved in data mining
Researcher:	TRIPATHY,MURCHHANA
Guide(s):	Champati,Santilata
Keywords:	Computer Science Computer Science Theory and Methods Engineering and Technology
University:	Siksha quotOquot Anusandhan University
Completed Date:	2021
Abstract:	Data Mining involves the study and analysis of large datasets to discover hidden newlineknowledge and information. The prime objective of this thesis is to study some newlinealgorithms used in Data Mining technique and apply some concepts of linear algebra, newlinerough set theory and analysis to upgrade the quality output and simplify the newlineimplementation. Many classical algorithms exist to perform various Data Mining tasks newlinesuch as classification, clustering, association mining, time-series data analysis, outlier newlinedetection etc. and also many algorithms are formulated every day by the research newlinecommunity. This thesis tries to understand and investigate the algorithm selection newlineproblem, some of the algorithms and methods used in classification and clustering and newlineformulate a few new ones. The first part of the thesis deals with the algorithm selection newlineproblem using meta-learning based analysis and ranking combination which can be newlineconsidered as a general class of problem arising in the field of Data Mining. As an newlineoutcome of the analysis, two new ranking combination methods namely the Percentage newlineranking and the Relative ranking have been proposed for algorithm selection. newlinemethods and three other existing ranking combination methods such as Average newlineranking, Score ranking and Winner ranking have been analyzed from the perspective of a newlinemeasurement system. The analysis shows that our proposed Percentage ranking method newlinehelps to answer how much an algorithm is better than others in terms of performance. newlineFrom a general class of problem of algorithm selection in Data Mining, we move to newlineclustering in the second part because it was the next easiest thing to study in Data newlineMining algorithms. Clustering does not require any more information about a dataset newlineother than the dataset itself. However, some of the data may be missing. To deal with newlinethat missing data imputation using median and mode have been proposed inspired by the newlinemean-imputation. Two matrix factorization techniques, the SVD (Singular Value newlineDecomposition) and the NMF (Nonnegative Matrix Facto
Pagination:	xvi,138
URI:	http://hdl.handle.net/10603/356781
Appears in Departments:	Department of Computer Science

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	410.29 kB	Adobe PDF	View/Open
02-declaration.pdf		277.58 kB	Adobe PDF	View/Open
03_certificate.pdf		279.91 kB	Adobe PDF	View/Open
04_acknowledgement.pdf		147.68 kB	Adobe PDF	View/Open
05_contents.pdf		379.71 kB	Adobe PDF	View/Open
06_list of figures and table.pdf		409.91 kB	Adobe PDF	View/Open
07_chapter 1.pdf		586.31 kB	Adobe PDF	View/Open
08_chapter 2.pdf		604.82 kB	Adobe PDF	View/Open
09_chapter 3.pdf		696.63 kB	Adobe PDF	View/Open
10_chapter 4.pdf		3.8 MB	Adobe PDF	View/Open
11_chapter 5.pdf		594.68 kB	Adobe PDF	View/Open
12_chapter 6.pdf		410.14 kB	Adobe PDF	View/Open
13_bibliography.pdf		384.59 kB	Adobe PDF	View/Open
80_recommendation.pdf		174.43 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET