Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/557668
Title: | Unbalanced data classification using genetic programming |
Researcher: | Kumar, Arvind |
Guide(s): | Goel, Shivani |
Keywords: | Computer Science Computer Science Software Engineering Engineering and Technology |
University: | Bennett University |
Completed Date: | 2022 |
Abstract: | In many real-world classification applications, such as medical diagnosis, fraud detection, bioinformatics, or fault diagnostics, it is common that one class has only a limited number of newlinetraining instances (called the minority class), while the other class (called the majority class) conceive the rest. Such types of data sets are called unbalanced. In data classification, machine learning (ML) methods can face a performance bias when the nature of data sets is unbalanced. In this case, the trained classifiers may have good accuracy on the majority class but lower newlineaccuracy on the minority class. Genetic Programming (GP) is an optimistic machine learning newlinemethod based on the Darwinian theory of evolution to automatically emerge computer programs newlineto solve problems without any domain-specific knowledge. Although GP has revealed much newlinesuccess in developing reliable and precise classifiers for typical classification jobs, GP, like newlinemany other ML algorithms, can produce biased classifiers when the nature of data is unbalanced. newlineThis biasing is because traditional training standards such as the overall success rate in the newlinefitness function in GP can be influenced by the more significant number of instances from the newlinemajority class. newlineThis research focuses on algorithmic methods assuming that the whole training data is newlineimportant and valuable, and no data sample should be removed from the training process. The newlinesecond consideration in this work is that the proposed methods should be problem-independent, newlineand they should not expect any a-priori domain-specific or expert knowledge. Thus, this research focuses on developing GP-based approaches for unbalanced data-set classification, based newlineon internal cost alteration in the GP fitness function and facilitating the unbalanced data set to newlinebe used as is in the training process. This research work demonstrates that by designing newlinevarious methods in GP, we can evolve classifiers with good classification performance on the newlinemajority and the minority classes. These developed methods are eval |
Pagination: | 89p. |
URI: | http://hdl.handle.net/10603/557668 |
Appears in Departments: | School of Computer Science Engineering and Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 91.27 kB | Adobe PDF | View/Open |
02_prelim page.pdf | 1.13 MB | Adobe PDF | View/Open | |
03_content.pdf | 55.87 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 47.82 kB | Adobe PDF | View/Open | |
05_chapter 1.pdf | 227.3 kB | Adobe PDF | View/Open | |
06_chapter 2.pdf | 536.17 kB | Adobe PDF | View/Open | |
07_chapter 3.pdf | 121.13 kB | Adobe PDF | View/Open | |
08_chapter 4.pdf | 157.69 kB | Adobe PDF | View/Open | |
09_chapter 5.pdf | 397.91 kB | Adobe PDF | View/Open | |
10_chapter 6.pdf | 50.79 kB | Adobe PDF | View/Open | |
11_annexures.pdf | 189.61 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 141.27 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: