Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/166215
Full metadata record
DC FieldValueLanguage
dc.coverage.spatialComputer Vision and Applied Machine Learning
dc.date.accessioned2017-08-14T07:43:12Z-
dc.date.available2017-08-14T07:43:12Z-
dc.identifier.urihttp://hdl.handle.net/10603/166215-
dc.description.abstractThe last decade has witnessed an explosion of visual data (images and videos) on the Internet, thanks to the availability of on-line photo-sharing websites and cheap media capturing devices. As a result, it has become necessary to develop new technologies that can help in efficiently archiving, accessing and understanding such large collections of visual data. This dissertation is a small step towards this, where we address the problem of automatically learning semantics of visual data (specifically images) in the form of natural text. newlineThough the domain of expressing image semantics in the form of text is very wide with several associated sub-problems and applications, we have targeted four specific challenges in this space: (1) To achieve good performance on rare labels that are usually more informative and distinctive than the frequent ones (without compromising on the frequent labels) in the image tagging task. (2) To use data from multiple sources and efficiently integrate them to both generate semantically meaningful descriptions for images, and retrieve images given descriptive textual queries. (3) To propose a structured prediction based technique for cross-modal (imagelt-gttext) retrieval task. (4) To design a generic model that can act as a wrapper over existing cross-modal retrieval techniques by making use of additional meta-data, and helps in boosting their performance. newlineWe evaluate the proposed methods on a number of popular and relevant datasets. On the image annotation task, we achieve near state-of-the-art results under multiple evaluation metrics. On the image captioning task, we achieve superior results compared to conventional methods that are mostly based on visual cues and corpus statistics. On the cross-modal retrieval task, both our approaches provide compelling improvements over baseline cross-modal retrieval techniques.
dc.format.extentxiii,158
dc.languageEnglish
dc.relation
dc.rightsself
dc.titleUnderstanding Semantic Association Between Images and Text
dc.title.alternative
dc.creator.researcherYashaswi Verma
dc.subject.keywordCross-media analysis
dc.subject.keywordImage Description
dc.subject.keywordImage Retrieval
dc.description.note
dc.contributor.guideC V Jawahar
dc.publisher.placeHyderabad
dc.publisher.universityInternational Institute of Information Technology, Hyderabad
dc.publisher.institutionComputer Science and Engineering
dc.date.registered3-5-2011
dc.date.completed17/07/2017
dc.date.awarded31/07/2017
dc.format.dimensions
dc.format.accompanyingmaterialNone
dc.source.universityUniversity
dc.type.degreePh.D.
Appears in Departments:Computer Science and Engineering

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File57.62 kBAdobe PDFView/Open
02_certificate.pdf30.2 kBAdobe PDFView/Open
03_acknowledgements.pdf28.73 kBAdobe PDFView/Open
04_contents.pdf74.36 kBAdobe PDFView/Open
05_preface.pdf39.74 kBAdobe PDFView/Open
06_list of tables.pdf41.01 kBAdobe PDFView/Open
07_list of figures.pdf44.85 kBAdobe PDFView/Open
08_chapter 1.pdf311.15 kBAdobe PDFView/Open
09_chapter 2.pdf689.22 kBAdobe PDFView/Open
10_chapter 3.pdf5.24 MBAdobe PDFView/Open
11_chapter 4.pdf8.3 MBAdobe PDFView/Open
12_chapter 5.pdf2.7 MBAdobe PDFView/Open
13_chapter 6.pdf7.51 MBAdobe PDFView/Open
14_chapter 7.pdf55.6 kBAdobe PDFView/Open
15_bibliography.pdf89.03 kBAdobe PDFView/Open


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: