Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/510334
Title: | Design and Development of Enhanced Multi modal Deep Learning Frameworks for Visual Question Answering VQA |
Researcher: | Manmadhan, Sruthy |
Guide(s): | Kovoor, Binsu C |
Keywords: | Computer Vision Engineering and Technology Information Technology Natural Language Processing (NLP) Visual Question Answering |
University: | Cochin University of Science and Technology |
Completed Date: | 2022 |
Abstract: | This thesis studies a multi-modal AI task called Visual Question Answering newline(VQA). It covers two different areas of computer science research; Computer Vision newline(CV) and Natural Language Processing (NLP). Due to its expansive set of newlineapplications including assistance to visually impaired people, surveillance data newlineanalysis etc., many researchers attracted to this AI-complete task for the last few newlineyears. Most of the existing works have given attention to the multi-modal feature newlinefusion phase of VQA ignoring the effect of individual input features. Thus, despite newlinerapid improvements in VQA algorithm efficiency, there is still a substantial gap newlinebetween the best methods and humans. The proposed research aims to design and newlinedevelop deep learning models for the AI-complete task of Visual Question newlineAnswering with enhanced multi-modal representations and thereby reducing the gap newlinebetween human and machine intelligence. The proposed research focus on each task in the established three phase newlinepipeline of VQA; image and question feature extractions, the multi-modal newlineembedding of visual and textual features and answer generation. The methodologies newlineused to tackle image featurization include a ranking and feature fusion framework to newlinefuse feature vectors from pre-trained CNN image feature extractors for a dataset, and newlinea dedicated Convolutional Denoising Auto-encoder (CDAE) design for extracting newlineimage features from domain-specific VQA images. newline |
Pagination: | xvi,240 |
URI: | http://hdl.handle.net/10603/510334 |
Appears in Departments: | Department of Information Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 194.87 kB | Adobe PDF | View/Open |
02 -preliminary pages.pdf | 836.83 kB | Adobe PDF | View/Open | |
03_content.pdf | 190.5 kB | Adobe PDF | View/Open | |
04_abstract.pdf | 178.42 kB | Adobe PDF | View/Open | |
05_chapter1.pdf | 908.35 kB | Adobe PDF | View/Open | |
06_chapter2.pdf | 2.02 MB | Adobe PDF | View/Open | |
07_chapter3.pdf | 1.12 MB | Adobe PDF | View/Open | |
08_chapter4.pdf | 1.17 MB | Adobe PDF | View/Open | |
09_chapter5.pdf | 1.42 MB | Adobe PDF | View/Open | |
10_chapter6.pdf | 1.45 MB | Adobe PDF | View/Open | |
11_chapter7.pdf | 1.79 MB | Adobe PDF | View/Open | |
12_chapter8.pdf | 1.03 MB | Adobe PDF | View/Open | |
14_annexures.pdf | 650.87 kB | Adobe PDF | View/Open | |
80_recommendation.pdf | 1.23 MB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: