Language models for temporal decisions in health datasets

Pal, Ridam

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/579599

Full metadata record

DC Field	Value	Language
dc.coverage.spatial
dc.date.accessioned	2024-07-30T05:02:36Z	-
dc.date.available	2024-07-30T05:02:36Z	-
dc.identifier.uri	http://hdl.handle.net/10603/579599	-
dc.description.abstract	Healthcare has been undergoing a data-driven transformation, further accelerated by the COVID-19 pandemic. A significant amount of healthcare data is unstructured and underutilized. The success of Large Language Models (LLMs) in achieving human-like conversations has unlocked their potential in healthcare. For example, language models can help improve patient outcomes through temporal decision support, early warning systems, and clinical risk assessment. Through our work, we have explained how language models can assist in pandemic preparedness and support decision-making processes in critical care. Integrated frameworks incorporating machine learning, deep learning, and language models have been developed to effectively track and analyze temporal changes in unstructured healthcare data, to make informed decisions, and to enhance patient outcomes in a dynamic healthcare landscape. In this thesis, my first contribution was a deep learning based language model for modeling the spike region of COVID-19 genome sequences. This led to novel knowledge discovery and real-world implementation for predicting pandemic progression, StrainFlow, which successfully captured COVID-19 caseloads two months ahead of their occurrence. The integrative framework for language models, statistical features and machine learning to capture the temporal changes in the semantics of the genomic sequence was deployed as a publicly available web-application. In my second contribution, I constructed language models on COVID-19 scientific literature to track and predict emerging scientific evidence. The findings of this contribution illustrated that temporal changes in unsupervised word embeddings of scientific literature effectively captured and tracked new knowledge. Additionally, my work leveraged machine learning techniques and predicted emerging themes based on evolving word associations. This was also implemented as an openly available web application called EvidenceFlow. In my third contribution, I developed language models on unstructured clinical notes data from intensive care units (ICU) for prognosticating critical outcomes. Shock Index (SI) is a commonly employed prognostic indicator used in intensive care units (ICU) and emergency settings to assess patient outcomes. We developed a comprehensive multimodal early warning system (EWS) utilizing an integrated framework combining machine learning, deep learning, and language models. The framework leverages routinely available vital signs and clinical notes data to detect abnormal shock index and provide timely alerts for potential deteriorations in patient health. This model is planned to be evaluated prospectively for real-world clinical decision making, which is outside the scope of my thesis. In our final contribution, I contributed to the development and deployment of an end-to-end language model pipeline and android application, WashKaro, for raising WASH awareness during the COVID-19 pandemic. This was one of the first AI-based information dissemination applications built during COVID-19, which provided both Hindi and English bite sized text and audio based upon text summarization, word embedding similarities and text-to-speech technologies using advanced NLP methods. The application and research publication also demonstrated the user-feedback based improvement of our AI model, providing pointers for designing public health intervention systems for pandemic preparedness. Overall, my thesis contributed to the development, evaluation, and deployment of language model based technologies in ICU and pandemic preparedness settings, specifically in the setting of future predictions and early warning systems using temporal data. The findings contribute to advancing knowledge and methodologies while assisting medical practitioners and policymakers in effectively responding to disease outbreaks and formulating data and AI-augmented policy for healthcare settings.
dc.format.extent	159 p.
dc.language	English
dc.relation
dc.rights	university
dc.title	Language models for temporal decisions in health datasets
dc.title.alternative
dc.creator.researcher	Pal, Ridam
dc.subject.keyword	Biology
dc.subject.keyword	Biology and Biochemistry
dc.subject.keyword	Life Sciences
dc.description.note
dc.contributor.guide	Sethi, Tavpritesh
dc.publisher.place	Delhi
dc.publisher.university	Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi)
dc.publisher.institution	Department of Computational Biology
dc.date.registered
dc.date.completed	2024
dc.date.awarded	2024
dc.format.dimensions
dc.format.accompanyingmaterial	None
dc.source.university	University
dc.type.degree	Ph.D.
Appears in Departments:	Department of Computational Biology

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	73.86 kB	Adobe PDF	View/Open
02_prelim pages.pdf		217.58 kB	Adobe PDF	View/Open
03_content.pdf		126.01 kB	Adobe PDF	View/Open
04_abstract.pdf		137.36 kB	Adobe PDF	View/Open
05_chapter 1.pdf		466.57 kB	Adobe PDF	View/Open
06_chapter 2.pdf		688.85 kB	Adobe PDF	View/Open
07_chapter 3.pdf		3.06 MB	Adobe PDF	View/Open
08_chapter 4.pdf		1.62 MB	Adobe PDF	View/Open
09_chapter 5.pdf		2.04 MB	Adobe PDF	View/Open
10_annexures.pdf		463.18 kB	Adobe PDF	View/Open
11_chapter 6.pdf		684.29 kB	Adobe PDF	View/Open
80_recommendation.pdf		172.97 kB	Adobe PDF	View/Open

Show simple item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET