Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/368815
Title: Improvement of Deep Cross Modal Retrieval using Common Sub Space Generation
Researcher: BHATT NIKITA MAHESHKUMAR
Guide(s): Ganatra Amit P
Keywords: Computer Science
Computer Science Artificial Intelligence
Engineering and Technology
University: Charotar University of Science and Technology
Completed Date: 2021
Abstract: The Cross-Modal Retrieval (CMR) has drawn significant attention due to the tremendous proliferation of Multi-Modal data and the need for flexibility by users. The CMR can perform image-sketch matching, text-image matching, audio-video matching, near infrared-visual image matching, which can be useful in many applications like a criminal investigations, hot topic detection, personalized recommendation, etc. However, the first challenge during CMR is the selection of an appropriate distributional representation model for text modality, which preserves semantic similarities between words. The second challenge in CMR is the modality-gap, as each modality (e.g., text, image, video, audio) has different statistical properties, which does not allow a direct comparison for retrieval. It is resolved by generating a common sub-space, which preserves semantic similarities between heterogeneous modalities. newlineTraditional CMR methods perform independent feature learning and correlation learning to generate a common sub-space, which does not achieve satisfactory performance. Recently, the growth of deep networks has achieved a lot of success in CMR by generating binary-valued or real-valued representation in the common sub-space. Further, existing deep learning-based approaches use pairwise labels to generate binary-valued representation, which gives the benefit of low storage requirement and fast retrieval. However, the relative similarity between heterogeneous data is ignored. newlineand#61623; In the first phase, a framework called quotDeep Cross-Modal Retrieval (DCMR)quot is proposed, which has adopted the Glove model and Multi-Layer Perceptron (MLP) for the text modality and VGG-F network for the image modality. To preserve relative similarities between Multi-Modal data, triplet labels are given as input to DCMR, which project similar(positive) data points nearer and dissimilar(negative) data points far in the vector space from the given query. Extensive experiments are performed, which shows that the performance of binary-valued DCMR increases a
Pagination: 
URI: http://hdl.handle.net/10603/368815
Appears in Departments:Faculty of Technology and Engineering

Files in This Item:
File Description SizeFormat 
17drce009-full thesis.pdfAttached File3.08 MBAdobe PDFView/Open
80_recommendation.pdf298.69 kBAdobe PDFView/Open
certificate.pdf118.9 kBAdobe PDFView/Open
chapter 1.pdf773.72 kBAdobe PDFView/Open
chapter 2.pdf983.89 kBAdobe PDFView/Open
chapter 3.pdf787.26 kBAdobe PDFView/Open
chapter 4.pdf1.06 MBAdobe PDFView/Open
chapter 5.pdf656.49 kBAdobe PDFView/Open
chapter 6.pdf278.75 kBAdobe PDFView/Open
chapter 7.pdf350.8 kBAdobe PDFView/Open
preliminary pages.pdf503.32 kBAdobe PDFView/Open
title page.pdf192.15 kBAdobe PDFView/Open
Show full item record


Items in Shodhganga are licensed under Creative Commons Licence Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

Altmetric Badge: