Understanding Semantic Association Between Images and Text

Yashaswi Verma

Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/166215

Title:	Understanding Semantic Association Between Images and Text
Researcher:	Yashaswi Verma
Guide(s):	C V Jawahar
Keywords:	Cross-media analysis Image Description Image Retrieval
University:	International Institute of Information Technology, Hyderabad
Completed Date:	17/07/2017
Abstract:	The last decade has witnessed an explosion of visual data (images and videos) on the Internet, thanks to the availability of on-line photo-sharing websites and cheap media capturing devices. As a result, it has become necessary to develop new technologies that can help in efficiently archiving, accessing and understanding such large collections of visual data. This dissertation is a small step towards this, where we address the problem of automatically learning semantics of visual data (specifically images) in the form of natural text. newlineThough the domain of expressing image semantics in the form of text is very wide with several associated sub-problems and applications, we have targeted four specific challenges in this space: (1) To achieve good performance on rare labels that are usually more informative and distinctive than the frequent ones (without compromising on the frequent labels) in the image tagging task. (2) To use data from multiple sources and efficiently integrate them to both generate semantically meaningful descriptions for images, and retrieve images given descriptive textual queries. (3) To propose a structured prediction based technique for cross-modal (imagelt-gttext) retrieval task. (4) To design a generic model that can act as a wrapper over existing cross-modal retrieval techniques by making use of additional meta-data, and helps in boosting their performance. newlineWe evaluate the proposed methods on a number of popular and relevant datasets. On the image annotation task, we achieve near state-of-the-art results under multiple evaluation metrics. On the image captioning task, we achieve superior results compared to conventional methods that are mostly based on visual cues and corpus statistics. On the cross-modal retrieval task, both our approaches provide compelling improvements over baseline cross-modal retrieval techniques.
Pagination:	xiii,158
URI:	http://hdl.handle.net/10603/166215
Appears in Departments:	Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
01_title.pdf	Attached File	57.62 kB	Adobe PDF	View/Open
02_certificate.pdf		30.2 kB	Adobe PDF	View/Open
03_acknowledgements.pdf		28.73 kB	Adobe PDF	View/Open
04_contents.pdf		74.36 kB	Adobe PDF	View/Open
05_preface.pdf		39.74 kB	Adobe PDF	View/Open
06_list of tables.pdf		41.01 kB	Adobe PDF	View/Open
07_list of figures.pdf		44.85 kB	Adobe PDF	View/Open
08_chapter 1.pdf		311.15 kB	Adobe PDF	View/Open
09_chapter 2.pdf		689.22 kB	Adobe PDF	View/Open
10_chapter 3.pdf		5.24 MB	Adobe PDF	View/Open
11_chapter 4.pdf		8.3 MB	Adobe PDF	View/Open
12_chapter 5.pdf		2.7 MB	Adobe PDF	View/Open
13_chapter 6.pdf		7.51 MB	Adobe PDF	View/Open
14_chapter 7.pdf		55.6 kB	Adobe PDF	View/Open
15_bibliography.pdf		89.03 kB	Adobe PDF	View/Open

Show full item record

Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge:

Shodhganga : a reservoir of Indian theses @ INFLIBNET