Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/394299
Title: Design and development of a framework for visual attention based automatic image description generation with game theoretic optimization
Researcher: Sreela S R
Guide(s): Suman Mary Idicula
Keywords: Automatic image description generation
Computer Science
Computer Science Interdisciplinary Applications
Engineering and Technology
University: Cochin University of Science and Technology
Completed Date: 2022
Abstract: This thesis deals with the development of an automatic image description system newlineand its applications. We have considered the encoder-decoder architecture newlinewith a visual attention mechanism for image description generation. The newlinesystem uses a Densely connected convolutional neural network as an encoder newlineand Bidirectional LSTM as a decoder. The visual attention mechanism is also newlineincorporated in this work. The optimization of the caption is also done using a newlineCooperative game-theoretic search. Two particular applications are designed newlinewith the proposed architecture of the system. An aid for visually impaired newlinepeople and medical image captioning are the two applications implemented. newlineAn enduring vision of Artificial Intelligence is to build robots that can newlinerecognize and learn the visual world and who can speak about it in natural newlinelanguage. A significant advancement in object recognition is happening in recent newlineyears. Automatic image description generation is a demanding problem newlinein Computer Vision and Natural Language Processing. The sentence annotation newlineof image and video enhances image indexing, searching, and retrieval, newlinevital for Content-Based Image Retrieval(CBIR). The applications of image newlinedescription generation systems are in biomedicine, military, commerce, digital newlinelibraries, education, and web searching. The description should contain newlinethe scene, action, objects, etc. newlineThe automatic image description generation system proposed in this thesis newlinemainly consists of various phases, such as image feature extraction, visualattention, caption generation, and caption optimization. Image feature extraction newlineis experimented with using multiple CNNs such as VGG, Resnet, newlineDensenet, etc. Densenet gives better performance for the automatic image newlinedescription generation system. newlineThe visual attention model is used for finding salient regions in the image. newlineThe spatial attention, channel-wise attention, and layer-wise attention are newlinedesigned and developed.
Pagination: 150
URI: http://hdl.handle.net/10603/394299
Appears in Departments:Department of Computer Science

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File53.19 kBAdobe PDFView/Open
02_declaration.pdf40.84 kBAdobe PDFView/Open
03_certificate.pdf41.75 kBAdobe PDFView/Open
04_acknowledgement.pdf44.6 kBAdobe PDFView/Open
05_content.pdf44.8 kBAdobe PDFView/Open
06_list of graph and table.pdf44.08 kBAdobe PDFView/Open
07_abstract.pdf43.87 kBAdobe PDFView/Open
08_chapter1.pdf62.65 kBAdobe PDFView/Open
09_chapter2.pdf422.02 kBAdobe PDFView/Open
10_chapter3.pdf136.69 kBAdobe PDFView/Open
11_chapter4.pdf345.89 kBAdobe PDFView/Open
12_chapter5.pdf195.63 kBAdobe PDFView/Open
13_chapter6.pdf309.74 kBAdobe PDFView/Open
14_chapter7.pdf120.31 kBAdobe PDFView/Open
15_chapter8.pdf1.4 MBAdobe PDFView/Open
16_chapter9.pdf288.2 kBAdobe PDFView/Open
17_chapter10.pdf46.12 kBAdobe PDFView/Open
18_reference.pdf92.72 kBAdobe PDFView/Open
80_recommendation.pdf59.4 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: