Unbalanced data classification using genetic programming

Kumar, Arvind

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/557668

Title:	Unbalanced data classification using genetic programming
Researcher:	Kumar, Arvind
Guide(s):	Goel, Shivani
Keywords:	Computer Science Computer Science Software Engineering Engineering and Technology
University:	Bennett University
Completed Date:	2022
Abstract:	In many real-world classification applications, such as medical diagnosis, fraud detection, bioinformatics, or fault diagnostics, it is common that one class has only a limited number of newlinetraining instances (called the minority class), while the other class (called the majority class) conceive the rest. Such types of data sets are called unbalanced. In data classification, machine learning (ML) methods can face a performance bias when the nature of data sets is unbalanced. In this case, the trained classifiers may have good accuracy on the majority class but lower newlineaccuracy on the minority class. Genetic Programming (GP) is an optimistic machine learning newlinemethod based on the Darwinian theory of evolution to automatically emerge computer programs newlineto solve problems without any domain-specific knowledge. Although GP has revealed much newlinesuccess in developing reliable and precise classifiers for typical classification jobs, GP, like newlinemany other ML algorithms, can produce biased classifiers when the nature of data is unbalanced. newlineThis biasing is because traditional training standards such as the overall success rate in the newlinefitness function in GP can be influenced by the more significant number of instances from the newlinemajority class. newlineThis research focuses on algorithmic methods assuming that the whole training data is newlineimportant and valuable, and no data sample should be removed from the training process. The newlinesecond consideration in this work is that the proposed methods should be problem-independent, newlineand they should not expect any a-priori domain-specific or expert knowledge. Thus, this research focuses on developing GP-based approaches for unbalanced data-set classification, based newlineon internal cost alteration in the GP fitness function and facilitating the unbalanced data set to newlinebe used as is in the training process. This research work demonstrates that by designing newlinevarious methods in GP, we can evolve classifiers with good classification performance on the newlinemajority and the minority classes. These developed methods are eval
Pagination:	89p.
URI:	http://hdl.handle.net/10603/557668
Appears in Departments:	School of Computer Science Engineering and Technology

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	91.27 kB	Adobe PDF	View/Open
02_prelim page.pdf		1.13 MB	Adobe PDF	View/Open
03_content.pdf		55.87 kB	Adobe PDF	View/Open
04_abstract.pdf		47.82 kB	Adobe PDF	View/Open
05_chapter 1.pdf		227.3 kB	Adobe PDF	View/Open
06_chapter 2.pdf		536.17 kB	Adobe PDF	View/Open
07_chapter 3.pdf		121.13 kB	Adobe PDF	View/Open
08_chapter 4.pdf		157.69 kB	Adobe PDF	View/Open
09_chapter 5.pdf		397.91 kB	Adobe PDF	View/Open
10_chapter 6.pdf		50.79 kB	Adobe PDF	View/Open
11_annexures.pdf		189.61 kB	Adobe PDF	View/Open
80_recommendation.pdf		141.27 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET