Please use this identifier to cite or link to this item:
http://hdl.handle.net/10603/368815
Title: | Improvement of Deep Cross Modal Retrieval using Common Sub Space Generation |
Researcher: | BHATT NIKITA MAHESHKUMAR |
Guide(s): | Ganatra Amit P |
Keywords: | Computer Science Computer Science Artificial Intelligence Engineering and Technology |
University: | Charotar University of Science and Technology |
Completed Date: | 2021 |
Abstract: | The Cross-Modal Retrieval (CMR) has drawn significant attention due to the tremendous proliferation of Multi-Modal data and the need for flexibility by users. The CMR can perform image-sketch matching, text-image matching, audio-video matching, near infrared-visual image matching, which can be useful in many applications like a criminal investigations, hot topic detection, personalized recommendation, etc. However, the first challenge during CMR is the selection of an appropriate distributional representation model for text modality, which preserves semantic similarities between words. The second challenge in CMR is the modality-gap, as each modality (e.g., text, image, video, audio) has different statistical properties, which does not allow a direct comparison for retrieval. It is resolved by generating a common sub-space, which preserves semantic similarities between heterogeneous modalities. newlineTraditional CMR methods perform independent feature learning and correlation learning to generate a common sub-space, which does not achieve satisfactory performance. Recently, the growth of deep networks has achieved a lot of success in CMR by generating binary-valued or real-valued representation in the common sub-space. Further, existing deep learning-based approaches use pairwise labels to generate binary-valued representation, which gives the benefit of low storage requirement and fast retrieval. However, the relative similarity between heterogeneous data is ignored. newlineand#61623; In the first phase, a framework called quotDeep Cross-Modal Retrieval (DCMR)quot is proposed, which has adopted the Glove model and Multi-Layer Perceptron (MLP) for the text modality and VGG-F network for the image modality. To preserve relative similarities between Multi-Modal data, triplet labels are given as input to DCMR, which project similar(positive) data points nearer and dissimilar(negative) data points far in the vector space from the given query. Extensive experiments are performed, which shows that the performance of binary-valued DCMR increases a |
Pagination: | |
URI: | http://hdl.handle.net/10603/368815 |
Appears in Departments: | Faculty of Technology and Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
17drce009-full thesis.pdf | Attached File | 3.08 MB | Adobe PDF | View/Open |
80_recommendation.pdf | 298.69 kB | Adobe PDF | View/Open | |
certificate.pdf | 118.9 kB | Adobe PDF | View/Open | |
chapter 1.pdf | 773.72 kB | Adobe PDF | View/Open | |
chapter 2.pdf | 983.89 kB | Adobe PDF | View/Open | |
chapter 3.pdf | 787.26 kB | Adobe PDF | View/Open | |
chapter 4.pdf | 1.06 MB | Adobe PDF | View/Open | |
chapter 5.pdf | 656.49 kB | Adobe PDF | View/Open | |
chapter 6.pdf | 278.75 kB | Adobe PDF | View/Open | |
chapter 7.pdf | 350.8 kB | Adobe PDF | View/Open | |
preliminary pages.pdf | 503.32 kB | Adobe PDF | View/Open | |
title page.pdf | 192.15 kB | Adobe PDF | View/Open |
Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).
Altmetric Badge: