Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/557668
Title: Unbalanced data classification using genetic programming
Researcher: Kumar, Arvind
Guide(s): Goel, Shivani
Keywords: Computer Science
Computer Science Software Engineering
Engineering and Technology
University: Bennett University
Completed Date: 2022
Abstract: In many real-world classification applications, such as medical diagnosis, fraud detection, bioinformatics, or fault diagnostics, it is common that one class has only a limited number of newlinetraining instances (called the minority class), while the other class (called the majority class) conceive the rest. Such types of data sets are called unbalanced. In data classification, machine learning (ML) methods can face a performance bias when the nature of data sets is unbalanced. In this case, the trained classifiers may have good accuracy on the majority class but lower newlineaccuracy on the minority class. Genetic Programming (GP) is an optimistic machine learning newlinemethod based on the Darwinian theory of evolution to automatically emerge computer programs newlineto solve problems without any domain-specific knowledge. Although GP has revealed much newlinesuccess in developing reliable and precise classifiers for typical classification jobs, GP, like newlinemany other ML algorithms, can produce biased classifiers when the nature of data is unbalanced. newlineThis biasing is because traditional training standards such as the overall success rate in the newlinefitness function in GP can be influenced by the more significant number of instances from the newlinemajority class. newlineThis research focuses on algorithmic methods assuming that the whole training data is newlineimportant and valuable, and no data sample should be removed from the training process. The newlinesecond consideration in this work is that the proposed methods should be problem-independent, newlineand they should not expect any a-priori domain-specific or expert knowledge. Thus, this research focuses on developing GP-based approaches for unbalanced data-set classification, based newlineon internal cost alteration in the GP fitness function and facilitating the unbalanced data set to newlinebe used as is in the training process. This research work demonstrates that by designing newlinevarious methods in GP, we can evolve classifiers with good classification performance on the newlinemajority and the minority classes. These developed methods are eval
Pagination: 89p.
URI: http://hdl.handle.net/10603/557668
Appears in Departments:School of Computer Science Engineering and Technology

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File91.27 kBAdobe PDFView/Open
02_prelim page.pdf1.13 MBAdobe PDFView/Open
03_content.pdf55.87 kBAdobe PDFView/Open
04_abstract.pdf47.82 kBAdobe PDFView/Open
05_chapter 1.pdf227.3 kBAdobe PDFView/Open
06_chapter 2.pdf536.17 kBAdobe PDFView/Open
07_chapter 3.pdf121.13 kBAdobe PDFView/Open
08_chapter 4.pdf157.69 kBAdobe PDFView/Open
09_chapter 5.pdf397.91 kBAdobe PDFView/Open
10_chapter 6.pdf50.79 kBAdobe PDFView/Open
11_annexures.pdf189.61 kBAdobe PDFView/Open
80_recommendation.pdf141.27 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: