Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/356781
Title: | Formulation and analysis of algorithms involved in data mining |
Researcher: | TRIPATHY,MURCHHANA |
Guide(s): | Champati,Santilata |
Keywords: | Computer Science Computer Science Theory and Methods Engineering and Technology |
University: | Siksha quotOquot Anusandhan University |
Completed Date: | 2021 |
Abstract: | Data Mining involves the study and analysis of large datasets to discover hidden newlineknowledge and information. The prime objective of this thesis is to study some newlinealgorithms used in Data Mining technique and apply some concepts of linear algebra, newlinerough set theory and analysis to upgrade the quality output and simplify the newlineimplementation. Many classical algorithms exist to perform various Data Mining tasks newlinesuch as classification, clustering, association mining, time-series data analysis, outlier newlinedetection etc. and also many algorithms are formulated every day by the research newlinecommunity. This thesis tries to understand and investigate the algorithm selection newlineproblem, some of the algorithms and methods used in classification and clustering and newlineformulate a few new ones. The first part of the thesis deals with the algorithm selection newlineproblem using meta-learning based analysis and ranking combination which can be newlineconsidered as a general class of problem arising in the field of Data Mining. As an newlineoutcome of the analysis, two new ranking combination methods namely the Percentage newlineranking and the Relative ranking have been proposed for algorithm selection. newlinemethods and three other existing ranking combination methods such as Average newlineranking, Score ranking and Winner ranking have been analyzed from the perspective of a newlinemeasurement system. The analysis shows that our proposed Percentage ranking method newlinehelps to answer how much an algorithm is better than others in terms of performance. newlineFrom a general class of problem of algorithm selection in Data Mining, we move to newlineclustering in the second part because it was the next easiest thing to study in Data newlineMining algorithms. Clustering does not require any more information about a dataset newlineother than the dataset itself. However, some of the data may be missing. To deal with newlinethat missing data imputation using median and mode have been proposed inspired by the newlinemean-imputation. Two matrix factorization techniques, the SVD (Singular Value newlineDecomposition) and the NMF (Nonnegative Matrix Facto |
Pagination: | xvi,138 |
URI: | http://hdl.handle.net/10603/356781 |
Appears in Departments: | Department of Computer Science |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 410.29 kB | Adobe PDF | View/Open |
02-declaration.pdf | 277.58 kB | Adobe PDF | View/Open | |
03_certificate.pdf | 279.91 kB | Adobe PDF | View/Open | |
04_acknowledgement.pdf | 147.68 kB | Adobe PDF | View/Open | |
05_contents.pdf | 379.71 kB | Adobe PDF | View/Open | |
06_list of figures and table.pdf | 409.91 kB | Adobe PDF | View/Open | |
07_chapter 1.pdf | 586.31 kB | Adobe PDF | View/Open | |
08_chapter 2.pdf | 604.82 kB | Adobe PDF | View/Open | |
09_chapter 3.pdf | 696.63 kB | Adobe PDF | View/Open | |
10_chapter 4.pdf | 3.8 MB | Adobe PDF | View/Open | |
11_chapter 5.pdf | 594.68 kB | Adobe PDF | View/Open | |
12_chapter 6.pdf | 410.14 kB | Adobe PDF | View/Open | |
13_bibliography.pdf | 384.59 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 174.43 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: