Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/166215
Title: Understanding Semantic Association Between Images and Text
Researcher: Yashaswi Verma
Guide(s): C V Jawahar
Keywords: Cross-media analysis
Image Description
Image Retrieval
University: International Institute of Information Technology, Hyderabad
Completed Date: 17/07/2017
Abstract: The last decade has witnessed an explosion of visual data (images and videos) on the Internet, thanks to the availability of on-line photo-sharing websites and cheap media capturing devices. As a result, it has become necessary to develop new technologies that can help in efficiently archiving, accessing and understanding such large collections of visual data. This dissertation is a small step towards this, where we address the problem of automatically learning semantics of visual data (specifically images) in the form of natural text. newlineThough the domain of expressing image semantics in the form of text is very wide with several associated sub-problems and applications, we have targeted four specific challenges in this space: (1) To achieve good performance on rare labels that are usually more informative and distinctive than the frequent ones (without compromising on the frequent labels) in the image tagging task. (2) To use data from multiple sources and efficiently integrate them to both generate semantically meaningful descriptions for images, and retrieve images given descriptive textual queries. (3) To propose a structured prediction based technique for cross-modal (imagelt-gttext) retrieval task. (4) To design a generic model that can act as a wrapper over existing cross-modal retrieval techniques by making use of additional meta-data, and helps in boosting their performance. newlineWe evaluate the proposed methods on a number of popular and relevant datasets. On the image annotation task, we achieve near state-of-the-art results under multiple evaluation metrics. On the image captioning task, we achieve superior results compared to conventional methods that are mostly based on visual cues and corpus statistics. On the cross-modal retrieval task, both our approaches provide compelling improvements over baseline cross-modal retrieval techniques.
Pagination: xiii,158
URI: http://hdl.handle.net/10603/166215
Appears in Departments:Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File57.62 kBAdobe PDFView/Open
02_certificate.pdf30.2 kBAdobe PDFView/Open
03_acknowledgements.pdf28.73 kBAdobe PDFView/Open
04_contents.pdf74.36 kBAdobe PDFView/Open
05_preface.pdf39.74 kBAdobe PDFView/Open
06_list of tables.pdf41.01 kBAdobe PDFView/Open
07_list of figures.pdf44.85 kBAdobe PDFView/Open
08_chapter 1.pdf311.15 kBAdobe PDFView/Open
09_chapter 2.pdf689.22 kBAdobe PDFView/Open
10_chapter 3.pdf5.24 MBAdobe PDFView/Open
11_chapter 4.pdf8.3 MBAdobe PDFView/Open
12_chapter 5.pdf2.7 MBAdobe PDFView/Open
13_chapter 6.pdf7.51 MBAdobe PDFView/Open
14_chapter 7.pdf55.6 kBAdobe PDFView/Open
15_bibliography.pdf89.03 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: