Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/510334
Title: Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA
Researcher: Manmadhan, Sruthy
Guide(s): Kovoor, Binsu C
Keywords: Computer Vision
Engineering and Technology
Information Technology
Natural Language Processing (NLP)
Visual Question Answering
University: Cochin University of Science and Technology
Completed Date: 2022
Abstract: This thesis studies a multi-modal AI task called Visual Question Answering newline(VQA). It covers two different areas of computer science research; Computer Vision newline(CV) and Natural Language Processing (NLP). Due to its expansive set of newlineapplications including assistance to visually impaired people, surveillance data newlineanalysis etc., many researchers attracted to this AI-complete task for the last few newlineyears. Most of the existing works have given attention to the multi-modal feature newlinefusion phase of VQA ignoring the effect of individual input features. Thus, despite newlinerapid improvements in VQA algorithm efficiency, there is still a substantial gap newlinebetween the best methods and humans. The proposed research aims to design and newlinedevelop deep learning models for the AI-complete task of Visual Question newlineAnswering with enhanced multi-modal representations and thereby reducing the gap newlinebetween human and machine intelligence. The proposed research focus on each task in the established three phase newlinepipeline of VQA; image and question feature extractions, the multi-modal newlineembedding of visual and textual features and answer generation. The methodologies newlineused to tackle image featurization include a ranking and feature fusion framework to newlinefuse feature vectors from pre-trained CNN image feature extractors for a dataset, and newlinea dedicated Convolutional Denoising Auto-encoder (CDAE) design for extracting newlineimage features from domain-specific VQA images. newline
Pagination: xvi,240
URI: http://hdl.handle.net/10603/510334
Appears in Departments:Department of Information Technology

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File194.87 kBAdobe PDFView/Open
02 -preliminary pages.pdf836.83 kBAdobe PDFView/Open
03_content.pdf190.5 kBAdobe PDFView/Open
04_abstract.pdf178.42 kBAdobe PDFView/Open
05_chapter1.pdf908.35 kBAdobe PDFView/Open
06_chapter2.pdf2.02 MBAdobe PDFView/Open
07_chapter3.pdf1.12 MBAdobe PDFView/Open
08_chapter4.pdf1.17 MBAdobe PDFView/Open
09_chapter5.pdf1.42 MBAdobe PDFView/Open
10_chapter6.pdf1.45 MBAdobe PDFView/Open
11_chapter7.pdf1.79 MBAdobe PDFView/Open
12_chapter8.pdf1.03 MBAdobe PDFView/Open
14_annexures.pdf650.87 kBAdobe PDFView/Open
80_recommendation.pdf1.23 MBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: