Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/166215
Title: | Understanding Semantic Association Between Images and Text |
Researcher: | Yashaswi Verma |
Guide(s): | C V Jawahar |
Keywords: | Cross-media analysis Image Description Image Retrieval |
University: | International Institute of Information Technology, Hyderabad |
Completed Date: | 17/07/2017 |
Abstract: | The last decade has witnessed an explosion of visual data (images and videos) on the Internet, thanks to the availability of on-line photo-sharing websites and cheap media capturing devices. As a result, it has become necessary to develop new technologies that can help in efficiently archiving, accessing and understanding such large collections of visual data. This dissertation is a small step towards this, where we address the problem of automatically learning semantics of visual data (specifically images) in the form of natural text. newlineThough the domain of expressing image semantics in the form of text is very wide with several associated sub-problems and applications, we have targeted four specific challenges in this space: (1) To achieve good performance on rare labels that are usually more informative and distinctive than the frequent ones (without compromising on the frequent labels) in the image tagging task. (2) To use data from multiple sources and efficiently integrate them to both generate semantically meaningful descriptions for images, and retrieve images given descriptive textual queries. (3) To propose a structured prediction based technique for cross-modal (imagelt-gttext) retrieval task. (4) To design a generic model that can act as a wrapper over existing cross-modal retrieval techniques by making use of additional meta-data, and helps in boosting their performance. newlineWe evaluate the proposed methods on a number of popular and relevant datasets. On the image annotation task, we achieve near state-of-the-art results under multiple evaluation metrics. On the image captioning task, we achieve superior results compared to conventional methods that are mostly based on visual cues and corpus statistics. On the cross-modal retrieval task, both our approaches provide compelling improvements over baseline cross-modal retrieval techniques. |
Pagination: | xiii,158 |
URI: | http://hdl.handle.net/10603/166215 |
Appears in Departments: | Computer Science and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
01_title.pdf | Attached File | 57.62 kB | Adobe PDF | View/Open |
02_certificate.pdf | 30.2 kB | Adobe PDF | View/Open | |
03_acknowledgements.pdf | 28.73 kB | Adobe PDF | View/Open | |
04_contents.pdf | 74.36 kB | Adobe PDF | View/Open | |
05_preface.pdf | 39.74 kB | Adobe PDF | View/Open | |
06_list of tables.pdf | 41.01 kB | Adobe PDF | View/Open | |
07_list of figures.pdf | 44.85 kB | Adobe PDF | View/Open | |
08_chapter 1.pdf | 311.15 kB | Adobe PDF | View/Open | |
09_chapter 2.pdf | 689.22 kB | Adobe PDF | View/Open | |
10_chapter 3.pdf | 5.24 MB | Adobe PDF | View/Open | |
11_chapter 4.pdf | 8.3 MB | Adobe PDF | View/Open | |
12_chapter 5.pdf | 2.7 MB | Adobe PDF | View/Open | |
13_chapter 6.pdf | 7.51 MB | Adobe PDF | View/Open | |
14_chapter 7.pdf | 55.6 kB | Adobe PDF | View/Open | |
15_bibliography.pdf | 89.03 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: