Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA

Manmadhan, Sruthy

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/510334

Title:	Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA
Researcher:	Manmadhan, Sruthy
Guide(s):	Kovoor, Binsu C
Keywords:	Computer Vision Engineering and Technology Information Technology Natural Language Processing (NLP) Visual Question Answering
University:	Cochin University of Science and Technology
Completed Date:	2022
Abstract:	This thesis studies a multi-modal AI task called Visual Question Answering newline(VQA). It covers two different areas of computer science research; Computer Vision newline(CV) and Natural Language Processing (NLP). Due to its expansive set of newlineapplications including assistance to visually impaired people, surveillance data newlineanalysis etc., many researchers attracted to this AI-complete task for the last few newlineyears. Most of the existing works have given attention to the multi-modal feature newlinefusion phase of VQA ignoring the effect of individual input features. Thus, despite newlinerapid improvements in VQA algorithm efficiency, there is still a substantial gap newlinebetween the best methods and humans. The proposed research aims to design and newlinedevelop deep learning models for the AI-complete task of Visual Question newlineAnswering with enhanced multi-modal representations and thereby reducing the gap newlinebetween human and machine intelligence. The proposed research focus on each task in the established three phase newlinepipeline of VQA; image and question feature extractions, the multi-modal newlineembedding of visual and textual features and answer generation. The methodologies newlineused to tackle image featurization include a ranking and feature fusion framework to newlinefuse feature vectors from pre-trained CNN image feature extractors for a dataset, and newlinea dedicated Convolutional Denoising Auto-encoder (CDAE) design for extracting newlineimage features from domain-specific VQA images. newline
Pagination:	xvi,240
URI:	http://hdl.handle.net/10603/510334
Appears in Departments:	Department of Information Technology

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	194.87 kB	Adobe PDF	View/Open
02 -preliminary pages.pdf		836.83 kB	Adobe PDF	View/Open
03_content.pdf		190.5 kB	Adobe PDF	View/Open
04_abstract.pdf		178.42 kB	Adobe PDF	View/Open
05_chapter1.pdf		908.35 kB	Adobe PDF	View/Open
06_chapter2.pdf		2.02 MB	Adobe PDF	View/Open
07_chapter3.pdf		1.12 MB	Adobe PDF	View/Open
08_chapter4.pdf		1.17 MB	Adobe PDF	View/Open
09_chapter5.pdf		1.42 MB	Adobe PDF	View/Open
10_chapter6.pdf		1.45 MB	Adobe PDF	View/Open
11_chapter7.pdf		1.79 MB	Adobe PDF	View/Open
12_chapter8.pdf		1.03 MB	Adobe PDF	View/Open
14_annexures.pdf		650.87 kB	Adobe PDF	View/Open
80_recommendation.pdf		1.23 MB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET