Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/156423
Title: Studies on accuracy enhancement by efficient feature selection and cost sensitive models
Researcher: R KARTHIKEYAN
Guide(s): V. KHANAA
University: Bharath University
Completed Date: 2017
Abstract: newline Even though there are many factors influencing the final results of the predicting exercises, we should be very careful enough to select the list of most important ones especially in the context like diagnosing heart diseases. We focus on dataset size, cost sensitiveness, regional influence, and prioritization of the features. Mining the data sets of different sizes or different regions many times need not yield similar results with expected maximum accuracy. Hence the data size or inherent regional characteristics act as important parameters for mining exercises. In this research studies firstly we consider data sets from two different geographical regions and the calculation of performance measures separately. Also, we get the same for integrated data set obtained by the union of the original sets independently as inverse results establishing the hypothesis for integrated data set. Secondly we consider the issue of mechanizing the prediction of new patientsand#8223; heart disease diagnosis based on data mining on historical data is an extremely useful tool in the cardiology stream. There exist many studies focusing on the specific aspect of the filtering the attributes. The objective here is two-folded. First, we look into four distinct classifiers for evaluating the relevancy of the attributes and then we follow with the investigations of the effects of feature selection in such experiments. Thirdly we address the issue of summative measures which vary drastically sometimes due to variations is the distribution of instances in the underlying dataset. Hence we have undertaken the problem of visualizing the sensitivity versus specificity in the graph in spite of the presence of appreciable accuracy. In this context we establish the value of area under the ROC curve maximum around 0.9063 and the maximum accuracy 75.667% iterating over the familiar set of classifiers. Finally we attend the problem of the cost models for measuring the loss due to wrong predictions are of interest and vary very much applicatio
Pagination: 
URI: http://hdl.handle.net/10603/156423
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
d11cs014_r.karthikeyan_thesis.pdfAttached File1.83 MBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: