Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/599540
Title: Identification of potential biomarkers for esophageal squamous cell carcinoma using unsupervised machine learning
Researcher: Baruah, Bikash
Guide(s): Banerjee, Subhasish and Dutta, Manash Pratim
Keywords: Biclustering Algorithms
Community Detection Algorithms
Machine Learning
Robust Analysis
University: National Institute of Technology Arunachal Pradesh
Completed Date: 2024
Abstract: Esophageal Squamous Cell Carcinoma (ESCC) is known for its high prevalence and aggressivness. It is often diagnosed at advanced stages due to the lack of specific symptoms, highlighting the urgent need to explore new diagnostic and therapeutic approaches. The identification of reliable biomarkers is pivotal for accurate diagnosis, prognosis, and the development of personalized treatment approaches tailored to individual patient profiles. This comprehensive study harnesses diverse datasets, including microarray, RNA sequencing (RNA-seq), and single cell RNA sequencing (scRNA-seq), to deeply explore the molecular landscape of ESCC. As the large-scale biological datasets missing data always becomes a challenging issue for the researchers , hence, this study introduces a novel ensemble algorithm for missing data imputation. The algorithm integrates four robust techniques: k- nearest neighbor, local least squares, K- means clustering, and missForest algorithm to effectively mitigate gaps in the datasets. Comparative analyses across eight distinct datasets demonstrate the superior performance and robustness of the proposed imputation method, showcasing its ability to enhance data completeness and reliability. Afterward, the research focuses on biomarker discovery using various biclustering algorithms to identify groups of genes with coherent expression patterns. Additionally, EnsemBic, an ensemble biclustering algorithm, is introduced to bolster the reliability and comprehensiveness of biomarker identification. Topological and biological analyses focusing on elite genes within identified biclusters aid in pinpointing potential biomarkers intricately linked to ESCC, providing insights into the underlying molecular mechanisms of the disease. Subsequently, community detection algorithms are applied to unveil latent structures within the datasets, revealing hidden biological communities. The development and evaluation of two novel community detection algorithms highlight their efficacy in identifying potential biomarkers.
Pagination: xviii, 157
URI: http://hdl.handle.net/10603/599540
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File23.86 kBAdobe PDFView/Open
02_prelim.pdf1.37 MBAdobe PDFView/Open
03_content.pdf431.59 kBAdobe PDFView/Open
04_abstract.pdf418.99 kBAdobe PDFView/Open
05_chapter 1.pdf372.69 kBAdobe PDFView/Open
06_chapter 2.pdf470.35 kBAdobe PDFView/Open
07_chapter 3.pdf1.05 MBAdobe PDFView/Open
08_chapter 4.pdf1.16 MBAdobe PDFView/Open
09_chapter 5.pdf1.32 MBAdobe PDFView/Open
10_chapter 6.pdf509.45 kBAdobe PDFView/Open
11_annexures.pdf397.87 kBAdobe PDFView/Open
80_recommendation.pdf210.75 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: