Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/602993
Full metadata record
DC FieldValueLanguage
dc.coverage.spatial
dc.date.accessioned2024-11-26T10:16:19Z-
dc.date.available2024-11-26T10:16:19Z-
dc.identifier.urihttp://hdl.handle.net/10603/602993-
dc.description.abstractWhat kind of a perception living creatures learn about the external environment including their own body is perceived through sensory information or modalities such as visuals, touch and hearing. Due to the rich characteristics of the environment, it is infrequent that a single modality provides efficient complete knowledge about any phenomena of interest. As when several senses are occupied in the processing of knowledge, we can have a better understanding. The increase in the obtainability of modalities on the same space provides new degrees of freedom for the fusion of modalities. Fusion of modalities is the process of combining features from different sources to obtain complementary information from each. This dissertation focuses on information fusion of multimodal data to provide high accuracy, scalability and enhanced performance for various tasks. In this research work we integrated the visual and linguistic modalities to have the improved decision making machine learning models. For this we have proposed three different frameworks for multimodal classification. The primary focus is to develop robust frameworks that utilize deep learning architectures for enhancement of multimodal classification accuracy and efficiency. In the first proposed work we address the challenge of effectively fusing features to improve food classification accuracy. The proposed model is evaluated on the UPMC Food 101 dataset and a newly created Bharatiya Food dataset. It involves feature extraction using fine-tuned Inception-v4 for visual and RoBERTa for its related text, followed by earlystage fusion to integrate these features effectively. The second proposed work introduces Deep Attentive Multimodal Fusion Network (DAMFN) which is an improvement to the previous model for multimodal food classification system. In this model majorly two significant improvements have been done - one update is in the feature extraction model of visual component and other is the increase in the size of the newly developed dataset. The model
dc.format.extentxiv, 139p.
dc.languageEnglish
dc.relation
dc.rightsuniversity
dc.titleMultimodal Machine Learning for an Efficient Information Retrieval Step into Next Generation Computing
dc.title.alternative
dc.creator.researcherSaklani, Avantika
dc.subject.keywordComputer Science
dc.subject.keywordComputer Science Information Systems
dc.subject.keywordEngineering and Technology
dc.subject.keywordInformation retrieval
dc.subject.keywordMachine learning
dc.description.note
dc.contributor.guideTiwari, Shailendra and Pannu, H S
dc.publisher.placePatiala
dc.publisher.universityThapar Institute of Engineering and Technology
dc.publisher.institutionDepartment of Computer Science and Engineering
dc.date.registered
dc.date.completed2024
dc.date.awarded2024
dc.format.dimensions
dc.format.accompanyingmaterialNone
dc.source.universityUniversity
dc.type.degreePh.D.
Appears in Departments:Department of Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File125.48 kBAdobe PDFView/Open
02_prelimpages.pdf592.41 kBAdobe PDFView/Open
03_content.pdf63.67 kBAdobe PDFView/Open
04_abstract.pdf75.86 kBAdobe PDFView/Open
05_chapter 1.pdf2.71 MBAdobe PDFView/Open
06_chapter 2.pdf95.27 kBAdobe PDFView/Open
07_chapter 3.pdf258.72 kBAdobe PDFView/Open
08_chapter 4.pdf1.08 MBAdobe PDFView/Open
09_chapter 5.pdf8.16 MBAdobe PDFView/Open
10_chapter 6.pdf5.46 MBAdobe PDFView/Open
11_chapter 7.pdf52.52 kBAdobe PDFView/Open
12_annexure.pdf134.64 kBAdobe PDFView/Open
80_recommendation.pdf157.29 kBAdobe PDFView/Open


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: