Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/356781
Title: Formulation and analysis of algorithms involved in data mining
Researcher: TRIPATHY,MURCHHANA
Guide(s): Champati,Santilata
Keywords: Computer Science
Computer Science Theory and Methods
Engineering and Technology
University: Siksha quotOquot Anusandhan University
Completed Date: 2021
Abstract: Data Mining involves the study and analysis of large datasets to discover hidden newlineknowledge and information. The prime objective of this thesis is to study some newlinealgorithms used in Data Mining technique and apply some concepts of linear algebra, newlinerough set theory and analysis to upgrade the quality output and simplify the newlineimplementation. Many classical algorithms exist to perform various Data Mining tasks newlinesuch as classification, clustering, association mining, time-series data analysis, outlier newlinedetection etc. and also many algorithms are formulated every day by the research newlinecommunity. This thesis tries to understand and investigate the algorithm selection newlineproblem, some of the algorithms and methods used in classification and clustering and newlineformulate a few new ones. The first part of the thesis deals with the algorithm selection newlineproblem using meta-learning based analysis and ranking combination which can be newlineconsidered as a general class of problem arising in the field of Data Mining. As an newlineoutcome of the analysis, two new ranking combination methods namely the Percentage newlineranking and the Relative ranking have been proposed for algorithm selection. newlinemethods and three other existing ranking combination methods such as Average newlineranking, Score ranking and Winner ranking have been analyzed from the perspective of a newlinemeasurement system. The analysis shows that our proposed Percentage ranking method newlinehelps to answer how much an algorithm is better than others in terms of performance. newlineFrom a general class of problem of algorithm selection in Data Mining, we move to newlineclustering in the second part because it was the next easiest thing to study in Data newlineMining algorithms. Clustering does not require any more information about a dataset newlineother than the dataset itself. However, some of the data may be missing. To deal with newlinethat missing data imputation using median and mode have been proposed inspired by the newlinemean-imputation. Two matrix factorization techniques, the SVD (Singular Value newlineDecomposition) and the NMF (Nonnegative Matrix Facto
Pagination: xvi,138
URI: http://hdl.handle.net/10603/356781
Appears in Departments:Department of Computer Science

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File410.29 kBAdobe PDFView/Open
02-declaration.pdf277.58 kBAdobe PDFView/Open
03_certificate.pdf279.91 kBAdobe PDFView/Open
04_acknowledgement.pdf147.68 kBAdobe PDFView/Open
05_contents.pdf379.71 kBAdobe PDFView/Open
06_list of figures and table.pdf409.91 kBAdobe PDFView/Open
07_chapter 1.pdf586.31 kBAdobe PDFView/Open
08_chapter 2.pdf604.82 kBAdobe PDFView/Open
09_chapter 3.pdf696.63 kBAdobe PDFView/Open
10_chapter 4.pdf3.8 MBAdobe PDFView/Open
11_chapter 5.pdf594.68 kBAdobe PDFView/Open
12_chapter 6.pdf410.14 kBAdobe PDFView/Open
13_bibliography.pdf384.59 kBAdobe PDFView/Open
80_recommendation.pdf174.43 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: