Machine learning based multi omics data analysis to identify subgroups in cancer for precision medicine

Khadirnaikar, Seema R

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/588713

Title:	Machine learning based multi omics data analysis to identify subgroups in cancer for precision medicine
Researcher:	Khadirnaikar, Seema R
Guide(s):	Mahadeva Prasanna, S R and Shukla, Sudhanshu
Keywords:	Classification, clustering Conditional WGAN (cWGAN) Data augmentation Dimensionality reduction Engineering Engineering and Technology Engineering Electrical and Electronic Machine learning Non-small cell lung cancer (NSCLC) Precision medicine
University:	Indian Institute of Technology Dharwad
Completed Date:	2023
Abstract:	The mortality rate associated with cancer is increasing at an exponential rate each year. Can- cer is a complex illness with notable diversity, making it crucial to adopt precision medicine approaches. Precision medicine endeavors to categorize patients into smaller subgroups based on molecular similarities. It also advocates for customized treatment plans that address the molecular variations within these subgroups, ultimately enhancing patient care. Currently, the prevailing practice involves classifying cancer patients primarily according to tumor grade and stage, which overlooks molecular variations and proves effective only in certain cases. Hence, there is an imperative to identify subgroups that consider molecular-level variations. More- over, characterizing patients based on these subgroups can yield valuable insights that facilitate precision therapy. newline newlineThis work initially focuses on identifying subgroups in non-small cell lung cancer (NSCLC), a leading cause of cancer-related deaths worldwide. To accomplish this, data from multiple molecular levels, including mRNA expression, miRNA expression, methylation, and protein expression, are combined and reduced to a lower dimension using an auto-encoder (AE), a machine learning technique for non-linear dimensionality reduction. Consensus K-means clus- tering is then applied to group patients with similar characteristics, resulting in the classification of NSCLC patients into five subgroups. Several statistical tests are then employed to identify the specific features that are differentially expressed (DE) in each subgroup, which further aids in their characterization. The subgroup with the most favorable survival time is found to ex- hibit the fewest genomic alterations. To identify the subgroup for a new sample, classification models such as support vector machines (SVM), random forest (RF), and feed-forward neural networks (FFNN) are trained using the DE features. Moreover, decision-level fused models are constructed by combining the prediction probabilities
Pagination:	xxvii, 234 p.
URI:	http://hdl.handle.net/10603/588713
Appears in Departments:	Department of Electrical Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	267.59 kB	Adobe PDF	View/Open
02_prelims page.pdf		371.69 kB	Adobe PDF	View/Open
03_content.pdf		77.43 kB	Adobe PDF	View/Open
04_abstract.pdf		80.08 kB	Adobe PDF	View/Open
05_chapter 1.pdf		526.81 kB	Adobe PDF	View/Open
06_chapter 2.pdf		262.88 kB	Adobe PDF	View/Open
07_chapter 3.pdf		3.29 MB	Adobe PDF	View/Open
08_chapter 4.pdf		2.62 MB	Adobe PDF	View/Open
09_chapter 5.pdf		2.36 MB	Adobe PDF	View/Open
10_chapter 6.pdf		2.94 MB	Adobe PDF	View/Open
11_chapter 7.pdf		115.95 kB	Adobe PDF	View/Open
12_annexures.pdf		330.21 kB	Adobe PDF	View/Open
80_recommendation.pdf		315.08 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET